APIR a universal FDR-control framework for boosting peptide identification power by aggregating multiple proteomics database search algorithms

Introduction

Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with guaranteed control on the false discovery rate (FDR) and guaranteed increase in the identified peptides. To fill in this gap, we propose a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under a target FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex protein standard shows that APIR outpowers individual database search algorithms and guarantees the FDR control. Real data studies show that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. Note that the APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. 

Publications

No Publication Information

Credits

  1. Yiling Chen yiling0210@ucla.edu
    Investigator

    Statistics, University of California, Los Angeles, China

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT007298
Tool TypeApplication
Category
Platforms
Technologies
User Interface
Download Count0
Country/RegionChina
Submitted ByYiling Chen
Fundings

This work was supported by the following grants: NIH-NCI T32LM012424 (to Y.E.C.); NCI K08 CA201591, Margaret Early Memorial Research Trust, and Pediatric Cancer Research Foundation (to L.D.W.); NCI P30CA033572, the NCI Cancer Center Support Grant (to the mass spectrometry facility at City of Hope); NIH/NIGMS R01GM120507 and R35GM140888, NSF DBI-1846216 and DMS-2113754, Johnson \& Johnson WiSTEM2D Award, Sloan Research Fellowship, and UCLA David Geffen School of Medicine W.M. Keck Foundation Junior Faculty Award (to J.J.L.).