Heterogeneous graph neural network for lncRNA-disease association prediction.

Hong Shi, Xiaomeng Zhang, Lin Tang, Lin Liu
Author Information
  1. Hong Shi: School of Information, Yunan Normal University, Kunming, 650092, China.
  2. Xiaomeng Zhang: School of Information, Yunan Normal University, Kunming, 650092, China.
  3. Lin Tang: Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, 650092, China.
  4. Lin Liu: School of Information, Yunan Normal University, Kunming, 650092, China. liulinrachel@163.com.

Abstract

Identifying lncRNA-disease associations is conducive to the diagnosis, treatment and prevention of diseases. Due to the expensive and time-consuming methods verified by biological experiments, prediction methods based on computational models have gradually become an important means of lncRNA-disease associations discovery. However, existing methods still have challenges to make full use of network topology information to identify potential associations between lncRNA and disease in multi-source data. In this study, we propose a novel method called HGNNLDA for lncRNA-disease association prediction. First, HGNNLDA constructs a heterogeneous network composed of lncRNA similarity network, lncRNA-disease association network and lncRNA-miRNA association network; Then, on this heterogeneous network, various types of strong correlation neighbors with fixed size are sampled for each node by restart random walk; Next, the embedding information of lncRNA and disease in each lncRNA-disease association pair is obtained by the method of type-based neighbor aggregation and all types combination though heterogeneous graph neural network, in which attention mechanism is introduced considering that different types of neighbors will make different contributions to the prediction of lncRNA-disease association. As a result, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) under fivefold cross-validation (5FCV) are 0.9786 and 0.8891, respectively. Compared with five state-of-art prediction models, HGNNLDA has better prediction performance. In addition, in two types of case studies, it is further verified that our method can effectively predict the potential lncRNA-disease associations, and have ability to predict new diseases without any known lncRNAs.

References

  1. Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007). [PMID: 17510325]
  2. Mercer, T. R., Dinger, M. E. & Mattick, J. S. Long non-coding RNAs: Insights into functions. Nat. Rev. Genet. 10, 155–159 (2009). [PMID: 19188922]
  3. Pasmant, E., Sabbagh, A., Vidaud, M. & Bièche, I. ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS. FASEB J. 25, 444–448 (2011). [PMID: 20956613]
  4. Congrains, A. et al. Genetic variants at the 9p21 locus contribute to atherosclerosis through modulation of ANRIL and CDKN2A/B. Atherosclerosis 220, 449–455 (2012). [PMID: 22178423]
  5. Zhang, Q., Chen, C.-Y., Yedavalli, V. S. & Jeang, K.-T. Neat1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression. MBio 4, e00596-12 (2013). [PMID: 23362321]
  6. Johnson, R. Long non-coding RNAs in Huntington’s disease neurodegeneration. Neurobiol. Dis. 46, 245–254 (2012). [PMID: 22202438]
  7. Ji, P. et al. MALAT-1, a novel noncoding RNA, and thymosin β4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene 22, 8031–8041 (2003). [PMID: 12970751]
  8. Barsyte-Lovejoy, D. et al. The c-Myc oncogene directly induces the H19 noncoding RNA by allele-specific binding to potentiate tumorigenesis. Cancer Res. 66, 5330–5337 (2006). [PMID: 16707459]
  9. De Kok, J. B. et al. DD3PCA3, a very sensitive and specific marker to detect prostate tumors. Cancer Res. 62, 2695–2698 (2002). [PMID: 11980670]
  10. Bao, Z. et al. LncRNADisease 20: An updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 47, D1034–D1037 (2019). [PMID: 30285109]
  11. Ning, S. et al. Lnc2Cancer: A manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 44, D980–D985 (2016). [PMID: 26481356]
  12. Dinger, M. E. et al. NRED: A database of long noncoding RNA expression. Nucleic Acids Res. 37, D122–D126 (2009). [PMID: 18829717]
  13. Wang, Y. et al. Mammalian ncRNA-disease repository: A global view of ncRNA-mediated disease network. Cell Death Dis. 4, e765–e765 (2013). [PMID: 23928704]
  14. Lu, Z., Bretonnel Cohen, K. & Hunter, L. GeneRIF quality assurance as summary revision. In Biocomputing 2007, 269–280 (World Scientific, 2007).
  15. Sun, J. et al. Inferring novel lncRNA-disease associations based on a random walk model of a lncRNA functional similarity network. Mol. BioSyst. 10, 2074–2081 (2014). [PMID: 24850297]
  16. Gu, C. et al. Global network random walk for predicting potential human lncRNA-disease associations. Sci. Rep. 7, 1–11 (2017). [DOI: 10.1038/s41598-017-12763-z]
  17. Wen, Y., Han, G. & Anh, V. V. Laplacian normalization and bi-random walks on heterogeneous networks for predicting lncRNA-disease associations. BMC Syst. Biol. 12, 11–19 (2018). [DOI: 10.1186/s12918-018-0660-0]
  18. Zhang, J., Zhang, Z., Chen, Z. & Deng, L. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 396–406 (2017). [PMID: 28489543]
  19. Zhao, X., Yang, Y. & Yin, M. Mhrwr: Prediction of lncRNA-disease associations based on multiple heterogeneous networks. In IEEE/ACM Transactions on Computational Biology and Bioinformatics (2020).
  20. Chen, X. & Yan, G.-Y. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics 29, 2617–2624 (2013). [PMID: 24002109]
  21. Chen, X. et al. Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity. Sci. Rep. 5, 1–12 (2015).
  22. Zhao, T. et al. Identification of cancer-related lncRNAs through integrating genome, regulome and transcriptome features. Mol. BioSyst. 11, 126–136 (2015). [PMID: 25354589]
  23. Lan, W. et al. LDAP: A web server for lncRNA-disease association prediction. Bioinformatics 33, 458–460 (2017). [PMID: 28172495]
  24. Sheng, N., Cui, H., Zhang, T. & Xuan, P. Attentional multi-level representation encoding based on convolutional and variance autoencoders for lncRNA-disease association prediction. Brief. Bioinform. 22, bbaa067 (2021). [PMID: 32444875]
  25. Xuan, P., Pan, S., Zhang, T., Liu, Y. & Sun, H. Graph convolutional network and convolutional neural network based method for predicting lncRNA-disease associations. Cells 8, 1012 (2019). [>PMCID: ]
  26. Wu, X. et al. Inferring lncRNA-disease associations based on graph autoencoder matrix completion. Comput. Biol. Chem. 87, 107282 (2020). [PMID: 32502934]
  27. Zhang, J., Jiang, Z., Hu, X. & Song, B. A novel graph attention adversarial network for predicting disease-related associations. Methods 179, 81–88 (2020). [PMID: 32446956]
  28. Wu, Q.-W., Xia, J.-F., Ni, J.-C. & Zheng, C.-H. GAERF: Predicting lncRNA-disease associations by graph auto-encoder and random forest. Brief. Bioinform. 22, bbaa391 (2021). [PMID: 33415333]
  29. Zhao, X., Zhao, X. & Yin, M. Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction. Brief. Bioinform. 23, bbab407 (2022). [PMID: 34585231]
  30. Lan, W. et al. GANLDA: Graph attention network for lncRNA-disease associations prediction. Neurocomputing 469, 384–393 (2022). [DOI: 10.1016/j.neucom.2020.09.094]
  31. Silva, A. B. O. V. & Spinosa, E. J. Graph convolutional auto-encoders for predicting novel lncRNA-disease associations. In IEEE/ACM Transactions on Computational Biology and Bioinformatics (2021).
  32. Zhang, C., Song, D., Huang, C., Swami, A. & Chawla, N. V. Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 793–803 (2019).
  33. Guo, Z.-H., You, Z.-H., Wang, Y.-B., Yi, H.-C. & Chen, Z.-H. A learning-based method for lncRNA-disease association identification combing similarity information and rotation forest. iScience 19, 786–795 (2019). [PMID: 31494494]
  34. Zhang, Y., Ye, F., Xiong, D. & Gao, X. LDNFSGB: Prediction of long non-coding RNA and disease association using network feature similarity and gradient boosting. BMC Bioinform. 21, 1–27 (2020). [DOI: 10.1186/s12859-020-03721-0]
  35. Madhavan, M. et al. Deep belief network based representation learning for lncRNA-disease association prediction. arXiv preprint arXiv:2006.12534 (2020).
  36. Zhu, R., Wang, Y., Liu, J.-X. & Dai, L.-Y. IPCARF: Improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier. BMC Bioinform. 22, 1–17 (2021). [DOI: 10.1186/s12859-021-04104-9]
  37. Lu, C. et al. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics 34, 3357–3364 (2018). [PMID: 29718113]
  38. Fu, G., Wang, J., Domeniconi, C. & Yu, G. Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. Bioinformatics 34, 1529–1537 (2018). [PMID: 29228285]
  39. Xuan, P., Cao, Y., Zhang, T., Kong, R. & Zhang, Z. Dual convolutional neural networks with attention mechanisms based method for predicting disease-related lncRNA genes. Front. Genet. 10, 416 (2019). [PMID: 31130990]
  40. Yao, D. et al. A random forest based computational model for predicting novel lncRNA-disease associations. BMC Bioinform. 21, 1–18 (2020). [DOI: 10.1186/s12859-020-3458-1]
  41. Yang, G., Lu, X. & Yuan, L. LncRNA: A link between RNA and cancer. Biochim. Biophys. Acta Gene Regul. Mech. 1839, 1097–1109 (2014). [DOI: 10.1016/j.bbagrm.2014.08.012]
  42. Li, J.-H., Liu, S., Zhou, H., Qu, L.-H. & Yang, J.-H. starBase v20: Decoding miRNA-ceRNA, miRNA-ncRNA and proteinRNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92–D97 (2014). [PMID: 24297251]
  43. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of go terms. Bioinformatics 23, 1274–1281 (2007). [PMID: 17344234]
  44. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26 (2013).
  45. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997). [PMID: 9377276]
  46. Veličković, P. et al. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

MeSH Term

RNA, Long Noncoding
Computational Biology
Algorithms
Neural Networks, Computer
MicroRNAs

Chemicals

RNA, Long Noncoding
MicroRNAs