Generative Artificial Intelligence Methodology Reporting in Otolaryngology: A Scoping Review. Review uri icon

Overview

abstract

  • OBJECTIVE: Researchers in otolaryngology-head and neck surgery (OHNS) have sought to explore the potential of large language models (LLMs), but many publications do not include crucial information, such as prompting approach and model parameters. This has substantial implications for reproducibility, since LLMs can generate different output based on differences in "prompt engineering." We aimed to critically review methodological reporting and quality of LLM-focused literature in OHNS. DATA SOURCES: Databases were searched in October 2024, including PubMed, Embase, Web of Science, ISCA Archive, IEEE Xplore, arXiv, medRxiv, and engRxiv. REVIEW METHODS: Abstract and full text review, as well as data extraction, were performed by two independent reviewers. All primary studies using LLMs within OHNS were included. RESULTS: From 925 abstracts retrieved, 117 were included. All studies used ChatGPT, with a minority (16.2%) including additional LLMs. Only 46.2% published direct quotations of all prompts. While the majority (76.9%) reported the number of prompts, only 6.8% rationalized this number, while 23.9% reported the number of runs per prompt. Most publications (73.5%) provided some description of prompt development, though only 11.1% explicitly described why specific decisions in prompt design were made, and only 6.0% reported prompt testing. There was no evidence that quality of methodology reporting was improving over time. CONCLUSION: LLM-focused literature in OHNS, while exploring many potentially fruitful avenues, demonstrates variable completeness in methodological reporting. This severely limits the generalizability of these studies and suggests that best practices could be further disseminated and enforced by researchers and journals.

publication date

  • September 25, 2025

Identity

Digital Object Identifier (DOI)

  • 10.1002/lary.70165

PubMed ID

  • 40995988