Compressed Representation of Extreme Learning Machine with Self-Diffusion Graph Denoising Applied for Dissecting Molecular Heterogeneity.

Xin Duan, Xinnan Ding, Yuelin Lu
Author Information
  1. Xin Duan: School of Artificial Intelligence, Anhui Polytechnic University, Wuhu, China. ORCID
  2. Xinnan Ding: College of Electrical Engineering, Anhui Polytechnic University, Wuhu, China.
  3. Yuelin Lu: School of Artificial Intelligence, Anhui Polytechnic University, Wuhu, China.

Abstract

Molecular heterogeneity exists in many biological systems, such as major malignancies or diverse cell populations. Clustering of gene expression profiles has been widely used to dissect molecular heterogeneity. One drawback common to most clustering methods is that they often suffer from high dimensionality and noise, as well as feature redundancy. To address these challenges, we propose Extreme learning machine self-diffusion (ELMSD), an auto-encoder extreme learning machine feature representation method that incorporates a self-diffusion graph denoising framework to effectively dissect molecular heterogeneity. Our method, ELMSD, first learns a compressed representation of gene expression profiles from the hidden layer of the autoencoder extreme learning machine, followed by an iterative graph diffusion process to enhance the sample-to-sample similarity. The enhanced graph can largely facilitate the downstream clustering analysis, making it more efficient to analyze molecular properties. To demonstrate the utility of ELMSD, we applied it on one simulation dataset, five single-cell datasets, and 20 cancer datasets. Experiment results show that the ELMSD approach outperforms several state-of-the-art clustering methods and cancer subtypes, cell types identified by ELMSD reveal strong clinical relevance and biological interpretation. The ELMSD code is available at: https://github.com/DXCODEE/ELMSD.

Keywords

Word Cloud

Created with Highcharts 10.0.0ELMSDheterogeneitymolecularlearningmachineclusteringself-diffusionextremerepresentationgraphMolecularbiologicalcellgeneexpressionprofilesdissectmethodsfeatureExtrememethodcompresseddatasetscancerexistsmanysystemsmajormalignanciesdiversepopulationsClusteringwidelyusedOnedrawbackcommonoftensufferhighdimensionalitynoisewellredundancyaddresschallengesproposeauto-encoderincorporatesdenoisingframeworkeffectivelyfirstlearnshiddenlayerautoencoderfollowediterativediffusionprocessenhancesample-to-samplesimilarityenhancedcanlargelyfacilitatedownstreamanalysismakingefficientanalyzepropertiesdemonstrateutilityappliedonesimulationdatasetfivesingle-cell20Experimentresultsshowapproachoutperformsseveralstate-of-the-artsubtypestypesidentifiedrevealstrongclinicalrelevanceinterpretationcodeavailableat:https://githubcom/DXCODEE/ELMSDCompressedRepresentationLearningMachineSelf-DiffusionGraphDenoisingAppliedDissectingHeterogeneity

Similar Articles

Cited By

No available data.