A multi-stage large language model framework for extracting suicide-related social determinants of health.

Overview

abstract

BACKGROUND: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability. METHODS: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study. RESULTS: We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability. CONCLUSIONS: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.

authors

Meng, Yuan
Xu, Zihan
Zhang, Jingze
Xiao, Yunyu
Ding, Ying
Xu, Xuhai
Ghosh, Joydeep
Peng, Yifan

publication date

September 29, 2025

published in

Communications medicine Journal

Identity

Digital Object Identifier (DOI)

10.1038/s43856-025-01114-z

PubMed ID

41023090

Additional Document Info

volume

5

issue

1

VIVO Weill Cornell Medical College

A multi-stage large language model framework for extracting suicide-related social determinants of health. Academic Article

Overview

abstract

authors

publication date

published in

Identity

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

volume

issue