African Pan Genome Contigs Expose Biologically Relevant Sequence Still Hidden from Human Reference Frameworks. uri icon

Overview

abstract

  • Human reference genomes underpin biomedical discovery but remain incomplete and biased toward European populations, constraining interpretation of genetic variation in underrepresented populations. Here we characterize African Pan Genome (APG) contigs totaling 296.5 Mb to define the sequence and functional landscape of genomic regions absent from current references. Most contigs align to the telomere-to-telomere (T2T-CHM13) genome and across 47 haplotype-resolved Human Pangenome Reference Consortium (HPRC) assemblies, with T2T-CHM13 placements enriched in centromeric and satellite repeats and overlapping 373 genes, including disease-associated loci. Mapping across HPRC assemblies revealed ancestry-associated contig enrichment, particularly in African genomes. Notably, 742 contigs remained unmapped under both stringent and relaxed criteria. These sequences are largely nonrepetitive and exhibit strong functional potential, including predicted protein-coding genes, CpG islands and transcriptional activity. Together, these results demonstrate that functionally relevant, ancestry-enriched genomic sequences remain absent from current references, with important implications for disease variant interpretation and precision medicine.

publication date

  • April 11, 2026

Identity

PubMed Central ID

  • PMC13082152

Digital Object Identifier (DOI)

  • 10.1101/2025.08.15.670543

PubMed ID

  • 41993395