Comparing random forest and elastic net models to predict substance use disorder transitions in participants with cannabis and stimulant use: Evidence from the All of Us cohort. Academic Article uri icon

Overview

abstract

  • BACKGROUND: Predicting progression from substance use to substance use disorder (SUD) is challenging, particularly for participants with cannabis and stimulant use who follow distinct risk trajectories. Machine learning enables integration of demographic, behavioral, wearable-derived, and social determinants of health (SDoH) data, yet few studies have compared linear and non-linear approaches in large, diverse populations. METHODS: Data came from the All of Us Research Program, a nationwide cohort integrating electronic health records, surveys, wearable metrics, and SDoH. Individuals with baseline cannabis or stimulant use were followed for incident SUD diagnoses. Predictors included demographics, wearable-derived activity and sleep, and SDoH domains (income, food insecurity, housing instability, transportation barriers). Elastic net (EN) logistic regression and random forest (RF) models were trained separately within cannabis and stimulant cohorts. Discrimination was evaluated on independent test sets using the area under the receiver operating characteristic curve (AUC) and compared with the DeLong test. RESULTS: For participants with cannabis use, EN and RF showed similar performance (AUC = 0.740 vs. 0.741; DeLong p = 0.764). For participants with stimulant use, RF achieved AUC = 0.732 vs. EN = 0.698; DeLong p = 0.219. Demographic variables were the strongest predictors across models. SDoH indicators-particularly income-contributed substantially to prediction, while wearable-derived metrics provided incremental explanatory value primarily in EN models, with limited independent contribution in RF. CONCLUSIONS: EN and RF models achieved moderate prediction of SUD transitions. Incorporating SDoH and wearable-derived data enhanced interpretability and risk stratification, particularly in linear models, underscoring substance-specific pathways and the utility of multimodal frameworks for developing precision prevention strategies.

publication date

  • December 18, 2025

Research

keywords

  • Central Nervous System Stimulants
  • Machine Learning
  • Marijuana Abuse
  • Substance-Related Disorders

Identity

Digital Object Identifier (DOI)

  • 10.1016/j.drugalcdep.2025.113012

PubMed ID

  • 41442977

Additional Document Info

volume

  • 278