CPGPrompt: translating clinical guidelines into large language model-executable decision support. Academic Article uri icon

Overview

abstract

  • OBJECTIVE: Clinical practice guidelines (CPGs) provide evidence-based recommendations for patient care; however, integrating them into artificial intelligence (AI) remains challenging. Previous approaches, such as rule-based systems or black-box AI models, face significant limitations, including poor interpretability, inconsistent adherence to guidelines, and narrow domain applicability. To address this, we develop and validate CPGPrompt, an auto-prompting system that converts narrative clinical guidelines into large language models (LLMs). MATERIALS AND METHODS: Our framework translates CPGs into structured decision trees and utilizes an LLM to dynamically navigate them for patient case evaluation. Synthetic vignettes were generated across 3 domains-headache, lower back pain, and prostate cancer-and distributed into 4 categories to test different decision scenarios. System performance was assessed on both binary specialty referral decisions and fine-grained pathway classification tasks. RESULTS: The binary specialty referral classification achieved consistently strong performance across all domains (F1: 0.85-1.00), with high recall (1.00 ± 0.00). In contrast, multiclass pathway assignment showed reduced performance, with domain-specific variations: headache (F1: 0.47), lower back pain (F1: 0.72), and prostate cancer (F1: 0.77). DISCUSSION: Domain-specific performance differences reflected the structure of each guideline. The headache guideline highlighted challenges with negation handling. The lower back pain guideline required temporal reasoning. In contrast, prostate cancer pathways benefited from quantifiable laboratory tests, resulting in more reliable decision-making. CONCLUSION: CPGPrompt demonstrates generalizability across diverse clinical domains while maintaining high sensitivity for referral decisions. Its transparent, auditable framework enables the systematic identification of failure modes and provides advantages over black-box AI approaches. However, persistent challenges with subjective clinical assessments indicate a need for targeted improvements and greater clinical robustness.

publication date

  • February 26, 2026

Identity

Digital Object Identifier (DOI)

  • 10.1093/jamia/ocag026

PubMed ID

  • 41746783