DeepCNPP: Deep Learning Architecture to Distinguish the Promoter of Human Long Non-Coding RNA Genes and Protein-Coding Genes.
Tanvir Alam, Mohammad Tariqul Islam, Mowafa Househ, Samir Brahim Belhaouari, Ferdaus Ahmed Kawsar
Author Information
Tanvir Alam: Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar.
Mohammad Tariqul Islam: Computer Science Department, Southern Connecticut State University, USA.
Mowafa Househ: Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar.
Samir Brahim Belhaouari: Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar.
Ferdaus Ahmed Kawsar: Department of Computing, East Tennessee State University, USA.
Promoter region of protein-coding genes are gradually being well understood, yet no comparable studies exist for the promoter of long non-coding RNA (lncRNA) genes which has emerged as a global potential regulator in multiple cellular process and different diseases for human. To understand the difference in the transcriptional regulation pattern of these genes, previously, we proposed a machine learning based model to classify the promoter of protein-coding genes and lncRNA genes. In this study, we are presenting DeepCNPP (deep coding non-coding promoter predictor), an improved model based on deep learning (DL) framework to classify the promoter of lncRNA genes and protein-coding genes. We used convolution neural network (CNN) based deep network to classify the promoter of these two broad categories of human genes. Our computational model, built upon the sequence information only, was able to classify these two groups of promoters from human at a rate of 83.34% accuracy and outperformed the existing model. Further analysis and interpretation of the output from DeepCNPP architecture will enable us to understand the difference in transcription regulatory pattern for these two groups of genes.