Ensuring generalizability and clinical utility in mental health care applications: Robust artificial intelligence-based treatment predictions in diverse psychosis populations.
Academic Article
Overview
abstract
AIM: Artificial Intelligence (AI)-based prediction models of treatment response promise to revolutionize psychiatric care by enabling personalized treatment, but very few have been thoroughly tested in different samples or compared to current clinical standards. Here we present models predicting antipsychotic response and assess their clinical utility in a robust methodological framework. METHODS: Machine learning models were trained and cross-validated on clinical and sociodemographic data from 594 individuals with established schizophrenia (NCT00014001) and 323 individuals with first episode psychosis (NCT03510325). Models predicted four measures of antipsychotic response at 3 months after baseline. Clinical utility was assessed using decision curve and calibration curve analyses. Model performance was tested in a reduced feature space and across sex, ethnicity, antipsychotic, and symptom change subgroups to investigate model fairness. RESULTS: Models predicting total symptom severity (r = 0.4-0.68) and symptomatic remission (BAC = 62.4%-69%) performed well in both samples and externally validated successfully in the opposing cohort (r = 0.4-0.5, BAC = 63.5%-65.7%). Performance remained significant when the models were reduced to 8-9 key variables (r = 0.53 for total symptom severity, BAC = 65.3% for symptomatic remission). Models predicting symptomatic remission had a net benefit across risk thresholds of 0.5-0.9 and were moderately well-calibrated (ECE = 0.16-0.18). Model performance different across sex, ethnicity and medication subgroups. CONCLUSIONS: We present a robust framework for training and assessing the clinical utility of prediction models in psychiatry. Our models generalize across different psychosis populations and show promising calibration and net benefit. However, performance disparities across demographic and treatment subgroups highlight the need for more diverse clinical samples to ensure equitable prediction.