Detection of medical text semantic similarity based on convolutional neural network.

Overview

abstract

BACKGROUND: Imaging examinations, such as ultrasonography, magnetic resonance imaging and computed tomography scans, play key roles in healthcare settings. To assess and improve the quality of imaging diagnosis, we need to manually find and compare the pre-existing reports of imaging and pathology examinations which contain overlapping exam body sites from electrical medical records (EMRs). The process of retrieving those reports is time-consuming. In this paper, we propose a convolutional neural network (CNN) based method which can better utilize semantic information contained in report texts to accelerate the retrieving process. METHODS: We included 16,354 imaging and pathology report-pairs from 1926 patients who admitted to Shanghai Tongren Hospital and had ultrasonic examinations between 1st May 2017 and 31st July 2017. We adapted the CNN model to calculate the similarities among the report-pairs to identify target report-pairs with overlapping body sites, and compared the performance with other six conventional models, including keyword mapping, latent semantic analysis (LSA), latent Dirichlet allocation (LDA), Doc2Vec, Siamese long short term memory (LSTM) and a model based on named entity recognition (NER). We also utilized graph embedding method to enhance the word representation by capturing the semantic relations information from medical ontologies. Additionally, we used LIME algorithm to identify which features (or words) are decisive for the prediction results and improved the model interpretability. RESULTS: Experiment results showed that our CNN model gained significant improvement compared to all other conventional models on area under the receiver operating characteristic (AUROC), precision, recall and F1-score in our test dataset. The AUROC of our CNN models gained approximately 3-7% improvement. The AUROC of CNN model with graph-embedding and ontology based medical concept vectors was 0.8% higher than the model with randomly initialized vectors and 1.5% higher than the one with pre-trained word vectors. CONCLUSION: Our study demonstrates that CNN model with pre-trained medical concept vectors could accurately identify target report-pairs with overlapping body sites and potentially accelerate the retrieving process for imaging diagnosis quality measurement.

authors

Zheng, Tao

Gao, Yimei

Wang, Fei
Fan, Chenhao
Fu, Xingzhi
Li, Mei
Zhang, Ya
Zhang, Shaodian
Ma, Handong

publication date

August 7, 2019

published in

BMC medical informatics and decision making Journal

Research

keywords

Algorithms
Electronic Health Records
Information Storage and Retrieval
Neural Networks, Computer

Identity

PubMed Central ID

PMC6686478

Scopus Document Identifier

85072018739

Digital Object Identifier (DOI)

10.1186/s12911-019-0880-2

PubMed ID

31391038

Additional Document Info

has global citation frequency

33

volume

19

issue

1

VIVO Weill Cornell Medical College

Detection of medical text semantic similarity based on convolutional neural network. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue