Combining evidence using p-values: application to sequence homology searches.

T L Bailey, M Gribskov
Author Information
  1. T L Bailey: San Diego Supercomputer Center, CA 92186-9784, USA.

Abstract

MOTIVATION: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches.
RESULTS: In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

Grants

  1. P41 RR-08605/NCRR NIH HHS

MeSH Term

Algorithms
DNA
Mathematical Computing
Proteins
Sequence Homology, Amino Acid
Sequence Homology, Nucleic Acid

Chemicals

Proteins
DNA

Word Cloud

Created with Highcharts 10.0.0sequenceevidenceindependentp-valuehomologysearchesintuitivemultiplepatternsmembershipclassavailablemotifsfamilyproductp-valuesMOTIVATION:illustratestatisticallyvalidmethodcombiningsourcesyieldscompleteapplyproblemdetectingsimultaneousmatchesRESULTS:analysistwoapproximatelymeasuresregionoftenlikeestimatelikelihoodmemberviewexampleestimatingsignificanceobservedmatchmacromolecularDNAproteinsetcharacterizebiologicalwayexpresspieceusemeasurederiveformulaalgorithmQFASTcalculatingstatisticaldistributionndemonstratesortingsequenceseffectivelycombinesinformationpresentleadinghighlyaccuratesensitiveCombiningusingp-values:application

Similar Articles

Cited By