Introduction

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Publications

  1. Adaptive seeds tame genomic sequence comparison.
    Cite this
    Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC, 2011-03-01 - Genome research
  2. Parameters for accurate genome alignment.
    Cite this
    Frith MC, Hamada M, Horton P, 2010-01-01 - BMC bioinformatics

Credits

  1. Szymon M Kiełbasa
    Developer

  2. Raymond Wan
    Developer

  3. Kengo Sato
    Developer

  4. Paul Horton
    Developer

  5. Martin C Frith
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT002248
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Submitted ByMartin C Frith