Large language models may struggle to detect culturally embedded filicide-suicide risks. Academic Article uri icon

Overview

abstract

  • This study examines the capacity of six large language models (LLMs)-GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b-to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction "Barbecue". The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story's risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother's isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments.

publication date

  • February 10, 2025

Research

keywords

  • Domestic Violence
  • Homicide
  • Language
  • Suicide

Identity

Scopus Document Identifier

  • 85217767323

Digital Object Identifier (DOI)

  • 10.1016/j.ajp.2025.104395

PubMed ID

  • 39955914

Additional Document Info

volume

  • 105