Use of the CHM13-T2T genome improves metagenomic analysis by minimizing host DNA contamination. Academic Article uri icon

Overview

abstract

  • Human-associated metagenomic data often contain human nucleic acid information, which can affect the accuracy of microbial classification or raise ethical concerns. These reads are typically removed through alignment to the human genome using various metagenomic mapping tools or human reference genomes, followed by filtration before metagenomic analysis. In this study, we conducted a comprehensive analysis to identify the optimal combination of alignment software and human reference genomes using benchmarking data. Our findings show that the combination of bwa-mem and the telomere-to-telomere human genome (CHM13-T2T) is the most effective in removing human reads in simulated data. We also analyzed CHM13-T2T-derived sequences in RefSeq to understand how CHM13-T2T reduces false positive results. Finally, we assessed clinical samples and found that CHM13-T2T effectively reduces host-derived contamination, particularly in low microbial biomass samples. This study provides a thorough overview of the application of CHM13-T2T in metagenomic analysis and highlights its significance in improving microbial classification accuracy.IMPORTANCEHuman gene sequences account for a large proportion of metagenomic sequences. To gain accurate and precise microbiome information, effective host-derived contamination removal methods are required. Both the alignment algorithm and the reference genome could influence the effectiveness of this process. The telomere-to-telomere human genome (CHM13-T2T) is a state-of-the-art human genome with 216 Mbp of additional new sequences compared with the commonly used GRCh38.p14. Our findings show the optimal dehosting effect of CHM13-T2T combined with the bwa-mem software in metagenomic analysis. We also investigate the reasons for the superiority of CHM13-T2T. Our study provides insights into optimal strategies for host sequence removal from metagenomic data. A standard reference is proposed for future metagenomic analysis, which can improve the accuracy of microbial identification.

publication date

  • September 10, 2025

Research

keywords

  • DNA Contamination
  • Genome, Human
  • Metagenomics

Identity

PubMed Central ID

  • PMC12542756

Scopus Document Identifier

  • 105019818877

Digital Object Identifier (DOI)

  • 10.1128/msystems.00840-25

PubMed ID

  • 40928236

Additional Document Info

volume

  • 10

issue

  • 10