The Effectiveness of Supervised Machine Learning in Screening and Diagnosing Voice Disorders: Systematic Review and Meta-analysis.
Review
Overview
abstract
BACKGROUND: When investigating voice disorders a series of processes are used when including voice screening and diagnosis. Both methods have limited standardized tests, which are affected by the clinician's experience and subjective judgment. Machine learning (ML) algorithms have been used as an objective tool in screening or diagnosing voice disorders. However, the effectiveness of ML algorithms in assessing and diagnosing voice disorders has not received sufficient scholarly attention. OBJECTIVE: This systematic review aimed to assess the effectiveness of ML algorithms in screening and diagnosing voice disorders. METHODS: An electronic search was conducted in 5 databases. Studies that examined the performance (accuracy, sensitivity, and specificity) of any ML algorithm in detecting pathological voice samples were included. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias. The methodological quality of each study was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool via RevMan 5 software (Cochrane Library). The characteristics of studies, population, and index tests were extracted, and meta-analyses were conducted to pool the accuracy, sensitivity, and specificity of ML techniques. The issue of heterogeneity was addressed by discussing possible sources and excluding studies when necessary. RESULTS: Of the 1409 records retrieved, 13 studies and 4079 participants were included in this review. A total of 13 ML techniques were used in the included studies, with the most common technique being least squares support vector machine. The pooled accuracy, sensitivity, and specificity of ML techniques in screening voice disorders were 93%, 96%, and 93%, respectively. Least squares support vector machine had the highest accuracy (99%), while the K-nearest neighbor algorithm had the highest sensitivity (98%) and specificity (98%). Quadric discriminant analysis achieved the lowest accuracy (91%), sensitivity (89%), and specificity (89%). CONCLUSIONS: ML showed promising findings in the screening of voice disorders. However, the findings were not conclusive in diagnosing voice disorders owing to the limited number of studies that used ML for diagnostic purposes; thus, more investigations are needed. While it might not be possible to use ML alone as a substitute for current diagnostic tools, it may be used as a decision support tool for clinicians to assess their patients, which could improve the management process for assessment. TRIAL REGISTRATION: PROSPERO CRD42020214438; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214438.