Sequence alignment by cross-correlation. Academic Article uri icon

Overview

abstract

  • Many recent advances in biology and medicine have resulted from DNA sequence alignment algorithms and technology. Traditional approaches for the matching of DNA sequences are based either on global alignment schemes or heuristic schemes that seek to approximate global alignment algorithms while providing higher computational efficiency. This report describes an approach using the mathematical operation of cross-correlation to compare sequences. It can be implemented using the fast fourier transform for computational efficiency. The algorithm is summarized and sample applications are given. These include gene sequence alignment in long stretches of genomic DNA, finding sequence similarity in distantly related organisms, demonstrating sequence similarity in the presence of massive (approximately 90%) random point mutations, comparing sequences related by internal rearrangements (tandem repeats) within a gene, and investigating fusion proteins. Application to RNA and protein sequence alignment is also discussed. The method is efficient, sensitive, and robust, being able to find sequence similarities where other alignment algorithms may perform poorly.

publication date

  • December 1, 2005

Research

keywords

  • Algorithms
  • DNA, Bacterial
  • Sequence Alignment
  • Sequence Homology, Nucleic Acid

Identity

PubMed Central ID

  • PMC2291754

Scopus Document Identifier

  • 33645740076

PubMed ID

  • 16522868

Additional Document Info

volume

  • 16

issue

  • 4