A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data.

Jeffrey C Miecznikowski, Senthilkumar Damodaran, Kimberly F Sellers, Richard A Rabin
Author Information
  1. Jeffrey C Miecznikowski: Department of Biostatistics; University at Buffalo, Buffalo, NY 14214 USA. jcm38@buffalo.edu.

Abstract

BACKGROUND: Numerous gel-based softwares exist to detect protein changes potentially associated with disease. The data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. A particularly important topic is how the various softwares handle missing data. To date, no one has extensively studied the impact that interpolating missing data has on subsequent analysis of protein spots.
RESULTS: This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error.
CONCLUSIONS: In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.

References

  1. J Proteome Res. 2007 Jul;6(7):2884-7 [PMID: 17550277]
  2. Electrophoresis. 2007 Sep;28(18):3324-32 [PMID: 17854127]
  3. Bioinformatics. 2005 Dec 1;21(23):4272-9 [PMID: 16216830]
  4. BMC Genomics. 2010 Jan 07;11:15 [PMID: 20056002]
  5. Nucleic Acids Res. 2006 Mar 20;34(5):1608-19 [PMID: 16549873]
  6. Mol Cell Proteomics. 2007 Aug;6(8):1354-64 [PMID: 17513293]
  7. Nucleic Acids Res. 2003 May 1;31(9):e52 [PMID: 12711697]
  8. Comput Biol Med. 2008 Oct;38(10):1112-20 [PMID: 18828999]
  9. Bioinformatics. 2006 Mar 1;22(5):566-72 [PMID: 16377613]
  10. BMC Bioinformatics. 2007 Mar 29;8:109 [PMID: 17394658]
  11. Stat Appl Genet Mol Biol. 2007;6:Article3 [PMID: 17402918]
  12. Bioinformatics. 2001 Jun;17(6):520-5 [PMID: 11395428]
  13. Ecotoxicol Environ Saf. 1998 Feb;39(2):78-97 [PMID: 9515080]
  14. BMC Bioinformatics. 2008 Jan 10;9:12 [PMID: 18186917]
  15. BMC Biotechnol. 2005 Feb 11;5:7 [PMID: 15707480]
  16. Bioinformatics. 2005 Jan 15;21(2):187-98 [PMID: 15333461]
  17. J Proteome Res. 2004 Nov-Dec;3(6):1210-8 [PMID: 15595730]
  18. Biochim Biophys Acta. 2006 Jul;1764(7):1179-87 [PMID: 16807148]
  19. OMICS. 2007 Summer;11(2):225-30 [PMID: 17594240]
  20. Proteomics. 2008 Apr;8(7):1371-83 [PMID: 18383008]
  21. BMC Bioinformatics. 2005 Apr 06;6:86 [PMID: 15813968]
  22. Hear Res. 2007 Apr;226(1-2):140-56 [PMID: 17321087]
  23. BMC Bioinformatics. 2006 Jan 22;7:32 [PMID: 16426462]
  24. Bioinformatics. 2009 Sep 1;25(17):2216-21 [PMID: 19561020]
  25. Ann Appl Stat. 2011 Jan 1;5(2A):894-923 [PMID: 22408711]
  26. Nucleic Acids Res. 2004 Feb 20;32(3):e34 [PMID: 14978222]

Word Cloud

Created with Highcharts 10.0.0datastatisticalanalysiserrorproteinmissingimputationtwo-dimensionaltestsmultipletestingelectrophoresistestsoftwaresvariousspotsalgorithmsgelcomparisonmethodliberaldeterminingrategFWERadvocatechoicecontrolBACKGROUND:Numerousgel-basedexistdetectchangespotentiallyassociateddiseasehoweverabundanttechnicalstructuralcomplexitiesmakingdifficulttaskparticularlyimportanttopichandledateoneextensivelystudiedimpactinterpolatingsubsequentRESULTS:workhighlightsexistinghandlingperformsthoroughsimulatedrealdatasetsmethodsbestresultstermsrootmeansquaredobtainedusingleastsquaresalongexpectationmaximizationEMalgorithmapproachestimatevaluesarraycovariancestructurebootstrappedversionsofferoptionspotsignificancegeneralizedfamilywiseconsideredcontrollingCONCLUSIONS:summarythree-step2-DEsteplastlylightworthconsideringwhetherwillsubjectedmassspectrometrycasepercentile-basedbootstraptcanemployedexperimentscontrolledratherfalsediscoveryprocedures

Similar Articles

Cited By