Alignathon: a competitive assessment of whole-genome alignment methods.

Overview

abstract

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

authors

Beal, Kathryn
Seledtsov, Igor
Molodtsov, Vladimir
Raney, Brian J
Clawson, Hiram
Kim, Jaebum
Kemena, Carsten
Chang, Jia-Ming
Erb, Ionas
Poliakov, Alexander
Hou, Minmei
Herrero, Javier
Kent, William James
Solovyev, Victor
Darling, Aaron E
Ma, Jian
Notredame, Cedric
Brudno, Michael
Dubchak, Inna
Haussler, David
Paten, Benedict

publication date

October 1, 2014

published in

Genome research Journal

Research

keywords

Genome
Genomics
Sequence Alignment
Software

Identity

PubMed Central ID

PMC4248324

Scopus Document Identifier

84913533708

Digital Object Identifier (DOI)

10.1101/gr.174920.114

PubMed ID

25273068

Additional Document Info

has global citation frequency

77

volume

24

issue

12

VIVO Weill Cornell Medical College

Alignathon: a competitive assessment of whole-genome alignment methods. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue