Machine Learning Prediction of Stroke Mechanism in Embolic Strokes of Undetermined Source. Academic Article uri icon

Overview

abstract

  • BACKGROUND AND PURPOSE: One-fifth of ischemic strokes are embolic strokes of undetermined source (ESUS). Their theoretical causes can be classified as cardioembolic versus noncardioembolic. This distinction has important implications, but the categories' proportions are unknown. METHODS: Using data from the Cornell Acute Stroke Academic Registry, we trained a machine-learning algorithm to distinguish cardioembolic versus non-cardioembolic strokes, then applied the algorithm to ESUS cases to determine the predicted proportion with an occult cardioembolic source. A panel of neurologists adjudicated stroke etiologies using standard criteria. We trained a machine learning classifier using data on demographics, comorbidities, vitals, laboratory results, and echocardiograms. An ensemble predictive method including L1 regularization, gradient-boosted decision tree ensemble (XGBoost), random forests, and multivariate adaptive splines was used. Random search and cross-validation were used to tune hyperparameters. Model performance was assessed using cross-validation among cases of known etiology. We applied the final algorithm to an independent set of ESUS cases to determine the predicted mechanism (cardioembolic or not). To assess our classifier's validity, we correlated the predicted probability of a cardioembolic source with the eventual post-ESUS diagnosis of atrial fibrillation. RESULTS: Among 1083 strokes with known etiologies, our classifier distinguished cardioembolic versus noncardioembolic cases with excellent accuracy (area under the curve, 0.85). Applied to 580 ESUS cases, the classifier predicted that 44% (95% credibility interval, 39%-49%) resulted from cardiac embolism. Individual ESUS patients' predicted likelihood of cardiac embolism was associated with eventual atrial fibrillation detection (OR per 10% increase, 1.27 [95% CI, 1.03-1.57]; c-statistic, 0.68 [95% CI, 0.58-0.78]). ESUS patients with high predicted probability of cardiac embolism were older and had more coronary and peripheral vascular disease, lower ejection fractions, larger left atria, lower blood pressures, and higher creatinine levels. CONCLUSIONS: A machine learning estimator that distinguished known cardioembolic versus noncardioembolic strokes indirectly estimated that 44% of ESUS cases were cardioembolic.

publication date

  • August 12, 2020

Research

keywords

  • Intracranial Embolism
  • Machine Learning
  • Stroke

Identity

PubMed Central ID

  • PMC8034802

Scopus Document Identifier

  • 85089922812

Digital Object Identifier (DOI)

  • 10.1093/biostatistics/kxaa022/5856304

PubMed ID

  • 32781943

Additional Document Info

volume

  • 51

issue

  • 9