Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

Overview

abstract

BACKGROUND: Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. PURPOSE: We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. MATERIALS AND METHODS: This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. RESULTS: 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83-1.00 (F1 score 0.84-0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704-0.790) at 30 days. CONCLUSION: An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.

authors

Pandey, Mohit

Xu, Zhuoran

Sholle, Evan
Maliakal, Gabriel
Singh, Gurpreet
Fatima, Zahra
Larine, Daria
Lee, Benjamin C
Wang, Jing
van Rosendael, Alexander R
Baskaran, Lohendran
Shaw, Leslee J
Min, James K
Al'Aref, Subhi J

publication date

July 30, 2020

published in

PloS one Journal

Research

keywords

Heart Failure
Image Processing, Computer-Assisted
Natural Language Processing
Neural Networks, Computer
Radiography, Abdominal
Radiography, Thoracic
Tomography, X-Ray Computed

Identity

PubMed Central ID

PMC7392233

Scopus Document Identifier

85088884101

Digital Object Identifier (DOI)

10.1371/journal.pone.0236827

PubMed ID

32730362

Additional Document Info

has global citation frequency

21

volume

15

issue

7

VIVO Weill Cornell Medical College

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue