Semi-supervised consensus clustering for gene expression data analysis.

Yunli Wang, Youlian Pan
Author Information
  1. Yunli Wang: National Research Council Canada, 46 Dineen Dr., Fredericton, Canada.
  2. Youlian Pan: National Research Council Canada, 1200 Montreal Rd., Ottawa, Canada.

Abstract

BACKGROUND: Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge.
METHODS: We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation.
RESULTS: Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper.

Keywords

References

  1. IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85 [PMID: 21742574]
  2. Proc Natl Acad Sci U S A. 2001 Nov 20;98(24):13790-5 [PMID: 11707567]
  3. BMC Bioinformatics. 2010 Dec 03;11:590 [PMID: 21129181]
  4. Cancer Res. 2001 Oct 15;61(20):7388-93 [PMID: 11606367]
  5. Cancer Cell. 2002 Mar;1(2):133-43 [PMID: 12086872]
  6. Bioinformatics. 2007 Nov 1;23(21):2888-96 [PMID: 17872912]
  7. Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):15149-54 [PMID: 11742071]
  8. BMC Bioinformatics. 2007;8 Suppl 10:S3 [PMID: 18269697]
  9. Bioinformatics. 2006 May 15;22(10):1259-68 [PMID: 16500932]
  10. Bioinformatics. 2009 Jul 15;25(14):1789-95 [PMID: 19497934]
  11. IEEE Trans Pattern Anal Mach Intell. 2011 Mar;33(3):568-86 [PMID: 20421667]
  12. Nat Genet. 2002 Jan;30(1):41-7 [PMID: 11731795]
  13. BMC Bioinformatics. 2008 Nov 27;9:497 [PMID: 19038021]
  14. Bioinformatics. 2006 Apr 1;22(7):795-801 [PMID: 16434443]
  15. Bioinformatics. 2010 Jun 15;26(12):1513-9 [PMID: 20444838]
  16. Genome Biol. 2004;5(11):R94 [PMID: 15535870]
  17. Nature. 2002 Jan 24;415(6870):436-42 [PMID: 11807556]
  18. J Biomed Inform. 2009 Feb;42(1):74-81 [PMID: 18595779]
  19. BMC Bioinformatics. 2008 Feb 11;9:92 [PMID: 18267003]
  20. Science. 1999 Oct 15;286(5439):531-7 [PMID: 10521349]
  21. BMC Bioinformatics. 2009 Aug 22;10:260 [PMID: 19698124]

Word Cloud

Created with Highcharts 10.0.0clusteringconsensusexpressiondatasemi-supervisedgeneknowledgequalitypriorSSCCSemi-supervisedk-meansanalysisnoisehighdimensionalitymicroarrayConsensusimprovecomparedonealgorithmusingimprovedBACKGROUND:SimplemethodshierarchicalwidelyusedunabledealassociatedappearsrobustnessresultsIncorporatingprocessshownconsistencypartitioningdomainMETHODS:proposedintegrateanalyzinginvestigatedrolesimprovingExperimentseightdatasetsperformedh-foldcross-validationRESULTS:UsingreducingimpactIntegrationperformanceseparatelymethodoutperformedotherstestedpaperGene

Similar Articles

Cited By