Currently Available Large Language Models Are Moderately Effective in Improving Readability of English and Spanish Patient Education Materials in Pediatric Orthopaedics.
Academic Article
Overview
abstract
INTRODUCTION: Patient education materials (PEMs) consistently exceed the recommended sixth-grade reading level. Poor health literacy and limited English proficiency, particularly in more than 40 million Spanish speakers, is associated with adverse patient outcomes. The use case of artificial intelligence (AI) to improve readability has rarely been validated in Spanish PEMs or in pediatric orthopaedic PEMs. This study aimed to (1) assess the availability and readability of English and Spanish pediatric orthopaedic PEMs and (2) compare the efficacy of ChatGPT-4.0 and Google Gemini to improve readability. METHODS: Pediatric orthopaedic PEMs were collected from 13 websites of pediatric orthopaedic hospitals and societies. Grade levels were assessed using the Flesch-Kincaid Grade-Level (FKGL) and Gunning Fog Index (GFI) for English articles and FKGL and Spanish Simple Measure of Gobbledygook (SMOG) for Spanish articles. English and Spanish PEMs were additionally assessed using Flesch Reading Ease (FRE) and Fernandez-Huerta Index (FHI), respectively. ChatGPT-4.0 and Google Gemini were prompted to rewrite article text at a sixth-grade level. AI-converted readability was compared categorically by proportion of articles ≤sixth-grade level and continuously through all metrics. RESULTS: Of 103 English articles, 40 (38.8%) were available in Spanish. Baseline readability ≤sixth FKGL was low for English (5.8%) and Spanish (10.0%) articles. 21.4% and 60.2% of ChatGPT-4.0-converted and Google Gemini-converted English PEMs achieved ≤sixth FKGL, respectively. 52.5% and 77.5% of ChatGPT-4.0-converted and Google Gemini-converted Spanish PEMs achieved ≤sixth FKGL, respectively. Google Gemini had greater absolute improvements in GFI, English FKGL, and Spanish SMOG, and a higher proportion of articles ≤ sixth-grade level (GFI, FKGL, Spanish SMOG) compared with ChatGPT-4.0 (all, P < 0.05). CONCLUSIONS: Pediatric orthopaedic PEMs are limited by complex readability and low availability of Spanish PEMs. Medical societies/hospitals may use AI models, particularly Google Gemini, to improve readability and patient comprehension, but increasing accessibility to Spanish PEMs is also necessary.