ProFOLD Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction
Introduction
Here, we report an end-to-end deep neural network frame-work, called CopulaNet, for estimating inter-residue distances. Unlike the existing methods, CopulaNet learns the conditional joint-residue distributions directly from MSAs rather than the hand-crafted features such as covariance matrices. The Copula-Net consists of three key elements, namely, an MSA encoder, a co-evolution aggregator, and a distance estimator. The MSA encoder processes each homologous protein in MSA individually, and embeds each residue to represent its context-specific muta-tions observed from homologous proteins of the target protein. For any two residues, the aggregator first calculates outer product of their embeddings derived from each homologous protein, then aggregates the outer products acquired from all homologous proteins using average pooling, and finally yields a measure of co-evolution between the two residues. Based on the obtained resi-due co-evolution, we use a two-dimensional residual network to estimate distance for any residue pairs.
Using CopulaNet as a core module, we develop an approach (called ProFOLD) to protein structure prediction. Briefly speak-ing, ProFOLD transforms the estimated distances into a potential function, and realizes a tertiary structural conformation with minimal potential. In the following sections, we first demonstrate the concept of ProFOLD using protein T0992-D1 as an example, then apply it to predict structures for the CASP13 target proteins as representatives, and finally compare it with the state-of-the-art prediction approaches. We also present analysis of contributions by the key elements of CopulaNet.
Publications
No Publication Information
Credits
- Dongbo Bu dbu@ict.ac.cn Investigator
University of Chinese Academy of Sciences, Beijing, China, China
Community Ratings
Usability | Efficiency | Reliability | Rated By |
---|---|---|---|
0 user | |||
Sign in to rate |
Accession | BT007268 |
---|---|
Tool Type | Application |
Category | Protein structures |
Platforms | Linux/Unix |
Technologies | C++ |
User Interface | |
Input Data | FASTA |
Latest Release | 1.0 (September 15, 2021) |
Download Count | 417 |
Country/Region | China |
Submitted By | Shaoliang Peng |
2018YFC0910400