Introduction

Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.

Publications

  1. PSI-BLAST pseudocounts and the minimum description length principle.
    Cite this
    Altschul SF, Gertz EM, Agarwala R, Schäffer AA, Yu YK, 2009-02-01 - Nucleic acids research
  2. PSI-BLAST tutorial.
    Cite this
    Bhagwat M, Aravind L, 2007-01-01 - Methods in molecular biology (Clifton, N.J.)
  3. Getting the most from PSI-BLAST.
    Cite this
    Jones DT, Swindells MB, 2002-03-01 - Trends in biochemical sciences
  4. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.
    Cite this
    Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF, 2001-07-01 - Nucleic acids research
  5. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases.
    Cite this
    Altschul SF, Koonin EV, 1998-11-01 - Trends in biochemical sciences
  6. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
    Cite this
    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, 1997-09-01 - Nucleic acids research
  7. Simple adjustment of the sequence weight algorithm remarkably enhances PSI-BLAST performance.
    Cite this
    Oda T, Lim K, Tomii K, 2017-06-01 - BMC Bioinformatics

Credits

  1. Stephen F Altschul
    Developer

    National Center for Biotechnology Information, National Library of Medicine, United States of America

  2. E Michael Gertz
    Developer

  3. Richa Agarwala
    Developer

  4. Alejandro A Schäffer
    Developer

  5. Yi-Kuo Yu
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006556
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC, C++
User InterfaceTerminal Command Line
Download Count0
Submitted ByYi-Kuo Yu