Introduction

Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity.

Publications

  1. Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.
    Cite this
    Pearson WR, Li W, Lopez R, 2017-04-01 - Nucleic acids research
  2. PSI-Search: iterative HOE-reduced profile SSEARCH searching.
    Cite this
    Li W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR, 2012-06-01 - Bioinformatics (Oxford, England)

Credits

  1. William R Pearson
    Developer

    Dept. of Biochemistry and Molecular Genetics, University of Virginia, United States of America

  2. Weizhong Li
    Developer

    European Bioinformatics Institute, EMBL Outstation

  3. Rodrigo Lopez
    Investigator

    European Bioinformatics Institute, EMBL Outstation

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001182
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesPerl
User InterfaceTerminal Command Line
Download Count0
Submitted ByRodrigo Lopez