Deep Learning Approaches for Breast Cancer Related Concepts Extraction from Electronic Health Records


Sicheng Zhou


Rui Zhang


Large amount of clinical information of breast cancer patients is hidden in the clinical texts of the electronic health record system (EHR). Automated extraction of target information for breast cancer patients from the EHR is important for timely clinical decision support. This study implemented and evaluated the state-of-the-art deep learning algorithms for named entity recognition task to extract breast cancer related concepts from the pathology reports in EHR systems at the University of Minnesota (UMN). The conditional random field (CRF), bidirectional long short-term memory-CRF and BERT fine-tuning models were developed in this study to extract 14 types of breast cancer related concept. The BERT fine-tuning models obtained the best overall F1 score equals to 0.868 for exact match and 0.890 for lenient match.


Deep Learning Approaches for Breast Cancer Related Concepts Extraction from Electronic Health Records