A neural network model for detection and classification of lumbar spinal stenosis on MRI.
Academic Article
Overview
abstract
OBJECTIVES: To develop a three-stage convolutional neural network (CNN) approach to segment anatomical structures, classify the presence of lumbar spinal stenosis (LSS) for all 3 stenosis types: central, lateral recess and foraminal and assess its severity on spine MRI and to demonstrate its efficacy as an accurate and consistent diagnostic tool. METHODS: The three-stage model was trained on 1635 annotated lumbar spine MRI studies consisting of T2-weighted sagittal and axial planes at each vertebral level. Accuracy of the model was evaluated on an external validation set of 150 MRI studies graded on a scale of absent, mild, moderate or severe by a panel of 7 radiologists. The reference standard for all types was determined by majority voting and in case of disagreement, adjudicated by an external radiologist. The radiologists' diagnoses were then compared to the diagnoses of the model. RESULTS: The model showed comparable performance to the radiologist average both in terms of the determination of presence/absence of LSS as well as severity classification, for all 3 stenosis types. In the case of central canal stenosis, the sensitivity, specificity and AUROC of the CNN were (0.971, 0.864, 0.963) for binary (presence/absence) classification compared to the radiologist average of (0.786, 0.899, 0.842). For lateral recess stenosis, the sensitivity, specificity and AUROC of the CNN were (0.853, 0.787, 0.907) compared to the radiologist average of (0.713, 0.898, 805). For foraminal stenosis, the sensitivity, specificity and AUROC of the CNN were (0.942, 0.844, 0.950) compared to the radiologist average of (0.879, 0.877, 0.878). Multi-class severity classifications showed similarly comparable statistics. CONCLUSIONS: The CNN showed comparable performance to radiologist subspecialists for the detection and classification of LSS. The integration of neural network models in the detection of LSS could bring higher accuracy, efficiency, consistency, and post-hoc interpretability in diagnostic practices.