Cancer type, stage and prognosis assessment from pathology reports using LLMs. Academic Article uri icon

Overview

abstract

  • Large Language Models (LLMs) have shown significant promise across various natural language processing tasks. However, their application in the field of pathology, particularly for extracting meaningful insights from unstructured medical texts such as pathology reports, remains underexplored and not well quantified. In this project, we leverage state-of-the-art language models, including the GPT family, Mistral models, and the open-source Llama models, to evaluate their performance in comprehensively analyzing pathology reports. Specifically, we assess their performance in cancer type identification, AJCC stage determination, and prognosis assessment, encompassing both information extraction and higher-order reasoning tasks. Based on a detailed analysis of their performance metrics in a zero-shot setting, we developed two instruction-tuned models: Path-llama3.1-8B and Path-GPT-4o-mini-FT. These models demonstrated superior performance in zero-shot cancer type identification, staging, and prognosis assessment compared to the other models evaluated.

publication date

  • July 26, 2025

Research

keywords

  • Natural Language Processing
  • Neoplasms

Identity

PubMed Central ID

  • PMC12297491

Digital Object Identifier (DOI)

  • 10.1038/s41598-025-10709-4

PubMed ID

  • 40715326

Additional Document Info

volume

  • 15

issue

  • 1