ProFOLD Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

Introduction

Here, we report an end-to-end deep neural network frame-work, called CopulaNet, for estimating inter-residue distances. Unlike the existing methods, CopulaNet learns the conditional joint-residue distributions directly from MSAs rather than the hand-crafted features such as covariance matrices. The Copula-Net consists of three key elements, namely, an MSA encoder, a co-evolution aggregator, and a distance estimator. The MSA encoder processes each homologous protein in MSA individually, and embeds each residue to represent its context-specific muta-tions observed from homologous proteins of the target protein. For any two residues, the aggregator first calculates outer product of their embeddings derived from each homologous protein, then aggregates the outer products acquired from all homologous proteins using average pooling, and finally yields a measure of co-evolution between the two residues. Based on the obtained resi-due co-evolution, we use a two-dimensional residual network to estimate distance for any residue pairs.
Using CopulaNet as a core module, we develop an approach (called ProFOLD) to protein structure prediction. Briefly speak-ing, ProFOLD transforms the estimated distances into a potential function, and realizes a tertiary structural conformation with minimal potential. In the following sections, we first demonstrate the concept of ProFOLD using protein T0992-D1 as an example, then apply it to predict structures for the CASP13 target proteins as representatives, and finally compare it with the state-of-the-art prediction approaches. We also present analysis of contributions by the key elements of CopulaNet.

Publications

No Publication Information

Credits

  1. Dongbo Bu dbu@ict.ac.cn
    Investigator

    University of Chinese Academy of Sciences, Beijing, China, China

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT007268
Tool TypeApplication
CategoryProtein structures
PlatformsLinux/Unix
TechnologiesC++
User Interface
Input DataFASTA
Latest Release1.0 (September 15, 2021)
Download Count417
Country/RegionChina
Submitted ByShaoliang Peng
Fundings

2018YFC0910400