Characterization of subclonal variants in HG002 Genome in a Bottle reference material as a resource for benchmarking variant callers. Academic Article uri icon

Overview

abstract

  • We developed a benchmark set of subclonal variants in the Genome in a Bottle (GIAB) Consortium HG002 reference material (RM) DNA for evaluating lower-frequency variant callsets. We used a somatic variant caller with high-coverage (300×) whole-genome sequencing data from the GIAB Ashkenazi Jewish trio to identify potential subclonal variants in the HG002 RM DNA. Using orthogonal sequencing data and manual curation, we defined a benchmark set with 85 high-confidence subclonal single-nucleotide variants (SNVs) (allele frequency [AF] > 5%) and a benchmark region covering 2.45 Gbp of the autosomes. External validation supported that it can be used to reliably identify both false negatives and false positives for a variety of sequencing technologies and variant callers. By adding our characterization of mosaic SNVs in this widely used cell line, we have expanded the scope of bioinformatic and sequencing applications for which the HG002 GIAB RM can be used to include benchmarking subclonal SNVs.

authors

publication date

  • December 19, 2025

Identity

Digital Object Identifier (DOI)

  • 10.1016/j.xgen.2025.101104

PubMed ID

  • 41421359