Identification of B-cell epitopes (BCEs) plays an essential role in the development of peptide vaccines, immuno-diagnostic reagents, and antibody design and production. In this work, we generated a large benchmark dataset comprising 126,779 experimentally-supported, linear epitope-containing regions in 3567 protein clusters from over 1.3 million B cell assays. Analysis of this curated dataset showed large pathogen diversity covering 176 different families. The accuracy in linear BCE prediction was found to strongly vary with different features, while all sequence and structural features were informative. To search more efficient and interpretive feature representations, a ten-layer deep learning framework for linear BCE prediction, namely NetBCE, was developed. NetBCE achieved high accuracy and robust performance with the average area under the curve (AUC) value of 0.846 in five-fold cross validation through automatically learning the informative classification features. NetBCE substantially outperformed the conventional machine learning algorithms and other tools, with an over 22.06% improvement of AUC value compared to other tools using an independent dataset. Through investigating the output of important network modules in NetBCE, epitopes and non-epitopes tended to present in distinct regions with efficient feature representation along the network layer hierarchy. The NetBCE is freely available at https://github.com/bsml320/NetBCE.
No Publication Information
- Haodong Xu firstname.lastname@example.org Investigator
UTHealth School of Biomedical Informatics, UT Health Science Center at Houston, United States of America
|Sign in to rate|
|User Interface||Terminal Command Line|
|Latest Release||1.0 (August 17, 2022)|
|Country/Region||United States of America|
|Submitted By||Haodong Xu|
This study was partially supported by National Institutes of Health grants (R01LM012806, R01DE030122, and R01DE029818). We thank the resource support from Cancer Prevention and Research Institute of Texas (CPRIT RP180734 and RP210045). Funding for open access charge: CPRIT (RP180734).