Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension.

Overview

abstract

OBJECTIVE: Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model's capability to recognize biomedical entities in scenarios of few-shot learning. METHODS: By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA. RESULTS: We examined the models' efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively. CONCLUSION: We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.

authors

Su, Leilei

Chen, Jian

Peng, Yifan
Sun, Cong

publication date

October 25, 2024

published in

Journal of biomedical informatics Journal

Research

keywords

Deep Learning

Identity

Scopus Document Identifier

85207363812

Digital Object Identifier (DOI)

10.1016/j.jbi.2024.104739

PubMed ID

39490610

Additional Document Info

has global citation frequency

4

volume

159

VIVO Weill Cornell Medical College

Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume