Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data.

Yawen Xiao, Jun Wu, Zongli Lin
Author Information
  1. Yawen Xiao: Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address: foreverxyw@sjtu.edu.cn.
  2. Jun Wu: The Center for Bioinformatics and Computational Biology, East China Normal University, Shanghai, 200241, China. Electronic address: junwu302@gmail.com.
  3. Zongli Lin: Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904-4743, USA. Electronic address: zl5y@virginia.edu.

Abstract

BACKGROUND AND OBJECTIVE: Cancer is a serious global disease due to its high mortality, and the key to effective treatment is accurate diagnosis. However, limited by sampling difficulty and actual sample size in clinical practice, data imbalance is a common problem in cancer diagnosis, while most conventional classification methods assume balanced data distribution. Therefore, addressing the imbalanced learning problem to improve the predictive performance of cancer diagnosis is significant.
METHODS: In the study, we dissect the data imbalance prevalent in cancer gene expression data and present an improved deep learning based Wasserstein generative adversarial network (WGAN) model, which provides a reliable training progress indicator and deeply explores the characteristics of data. The WGAN generates new samples from the minority class and solves the imbalance problem at the data level.
RESULTS: We analyze three publicly available data sets on RNA-seq of three kinds of cancer using the proposed WGAN and compare the results with those from two commonly adopted sampling methods. According to the results, through addressing the data imbalance problem, the balanced data distribution and the expanding sample size increase the prediction accuracy in all three data sets.
CONCLUSIONS: Therefore, the proposed WGAN method is superior in solving the imbalanced learning problem of gene expression data, providing significantly better prediction performance in cancer diagnosis.

Keywords

MeSH Term

Deep Learning
Neoplasms

Word Cloud

Created with Highcharts 10.0.0datadiagnosisproblemcancerlearningimbalanceWGANCancerimbalancedexpressiongenerativeadversarialthreesamplingsamplesizemethodsbalanceddistributionThereforeaddressingperformancegenedeepbasedWassersteinsetsusingproposedresultspredictionnetworksBACKGROUNDANDOBJECTIVE:seriousglobaldiseaseduehighmortalitykeyeffectivetreatmentaccurateHoweverlimiteddifficultyactualclinicalpracticecommonconventionalclassificationassumeimprovepredictivesignificantMETHODS:studydissectprevalentpresentimprovednetworkmodelprovidesreliabletrainingprogressindicatordeeplyexplorescharacteristicsgeneratesnewsamplesminorityclasssolveslevelRESULTS:analyzepubliclyavailableRNA-seqkindscomparetwocommonlyadoptedAccordingexpandingincreaseaccuracyCONCLUSIONS:methodsuperiorsolvingprovidingsignificantlybetterDeepGeneImbalanced

Similar Articles

Cited By