Introduction

Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator's heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.

Publications

  1. Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets.
    Cite this
    Spivak M, Weston J, Bottou L, Käll L, Noble WS, 2009-07-01 - Journal of proteome research
  2. Semi-supervised learning for peptide identification from shotgun proteomics datasets.
    Cite this
    Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ, 2007-11-01 - Nature methods

Credits

  1. Marina Spivak
    Developer

    NEC Labs America, Princeton, United States of America

  2. Jason Weston
    Developer

  3. Léon Bottou
    Developer

  4. Lukas Käll
    Developer

  5. William Stafford Noble
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006669
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++
User InterfaceTerminal Command Line
Download Count0
Submitted ByWilliam Stafford Noble