Clinical notes and other free-text documents provide a breadth of clinical information that is not often available within structured data. Transformer-based natural language processing (NLP) models, such as BERT, have demonstrated great promise in using transfer learning to improve clinical text processing. However, these models are commonly trained on generic corpora, which do not necessarily reflect many of the intricacies of the clinical domain.
This project will evaluate the extent to which varying degrees of transfer learning using transformer-based models can improve clinical NLP performance. Students will utilize different types of clinical text to train BERT models from scratch, which they will evaluate and compare against publicly available models, such as BERT Base, BioBERT, and ClinicalBERT. Students will participate in regular mentoring meetings with the team from the Vanderbilt Clinical Informatics Center to understand the context of clinical notes and understand the implications of applying transformer-based models to clinical tasks that improve healthcare delivery and patient care.
If you have any questions about this project and/or are interested in joining the team, please reach out to bryan.d.steitz@vumc.org. Please note that since these projects will involve clinical data, students must complete IRB training through CITI. The research group can help facilitate this training prior to starting.