NetBCE An Interpretable Deep Neural Network for Accurate Prediction of Linear B-Cell Epitopes
Manual
NetBCE Enables Accurate Prediction of Linear B-Cell Epitopes with Interpretable Deep Neural Network
Activated B-lymphocytes (B cells) produce antibodies that bind with specific antigens, and are a key component in vertebrate immune responses. Thus, identification of B-cell epitopes (BCEs) plays an essential role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production. Here, we obtained over 1.3 million B cell assays with experimentally identified BCE regions from IEDB database. Through quality control procedures, an experimentally well-characterized dataset was compiled, containing more than 126,000 experimentally epitope-containing regions from 3567 protein clusters. Numerous widely used sequence and structural features was encoded and benchmark tested by six conventional machine-learning algorithms. The result shown that different types of features displayed various accuracies for B cell epitope prediction and sequence features had superior performance compared to structural features. To learn a more efficient and interpretive representation of the epitope sequence hierarchically, a ten-layer deep learning framework, named NetBCE, was implemented to predict B cell epitopes. NetBCE achieved high accuracy and robust performance with the average AUC values of 0.8455 by 5-fold cross validation through automatically learning informative classification features. In comparison, NetBCE outperformed conventional machine learning methods and other existing tools based on the curated independent dataset, and achieved a≥8.84% improvement of AUC value for the B cell epitope prediction compared to other tools. To elucidate the capability of hierarchical representation by NetBCE, we visualized the epitopes and non-epitopes using UMAP based on the feature representation at varied network layers. We found the feature representation came to be more discriminative along the network layer hierarchy.
Installation
Download NetBCE by
git clone https://github.com/bsml320/NetBCE
Installation has been tested in Linux with Python 3.7. Since the package is written in python 3x, python3x with the pip tool must be installed. NetBCE uses the following dependencies: numpy, scipy, pandas, h5py, keras version=2.3.1, tensorflow=1.15. You can install these packages by the following commands:
conda create -n NetBCE python=3.7
pip install pandas
pip install numpy
pip install scipy
pip install h5py
pip install plotly
pip install dominate
pip install -v keras==2.3.1
pip install -v tensorflow==1.15
Usage
Please cd to the NetBCE/prediction/ folder which contains predict.py. Example:
python NetBCE_prediction.py -f ../testdata/test.fasta -o ../result/test_result
For details of other parameters, run:
python NetBCE_prediction.py --help
NetBCE analysis report
Based on to the model constructed in this study, we developed a software to provide function for linear B-cell epitope prediction. The software of NetBCE is available at https://github.com/bsml320/NetBCE. NetBCE provided and visualized the prediction results in an interactive html file using the Python, PHP, JavaScript and Bootstrap package with an easily readable and interpretable manner. Users can input the candidate proteins in a FASTA format. In addition, user needs to select one or more peptide lengths so that the software can construct a library of candidate epitope peptides. For an example output page, our software provides a probability score for each candidate peptide, and its value ranges from 0 to 1. All prediction results can be copied, printed and downloaded in 3 formats, including “CVS”, “Excel” and “PDF”. Our software additionally provided two an interactive html plot showing the distribution of lengths and scores for all candidate peptides.