African Pan Genome Contigs Expose Biologically Relevant Sequence Still Hidden from Human Reference Frameworks.
Overview
abstract
Human reference genomes underpin biomedical discovery but remain incomplete and biased toward European populations, constraining interpretation of genetic variation in underrepresented populations. Here we characterize African Pan Genome (APG) contigs totaling 296.5 Mb to define the sequence and functional landscape of genomic regions absent from current references. Most contigs align to the telomere-to-telomere (T2T-CHM13) genome and across 47 haplotype-resolved Human Pangenome Reference Consortium (HPRC) assemblies, with T2T-CHM13 placements enriched in centromeric and satellite repeats and overlapping 373 genes, including disease-associated loci. Mapping across HPRC assemblies revealed ancestry-associated contig enrichment, particularly in African genomes. Notably, 742 contigs remained unmapped under both stringent and relaxed criteria. These sequences are largely nonrepetitive and exhibit strong functional potential, including predicted protein-coding genes, CpG islands and transcriptional activity. Together, these results demonstrate that functionally relevant, ancestry-enriched genomic sequences remain absent from current references, with important implications for disease variant interpretation and precision medicine.