Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

NrichD

General information

URL: http://proline.biochem.iisc.ernet.in/NRICHD/
Full name: Remote Homology Detection using Enriched Databases
Description: NrichD is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships.
Year founded: 2014
Last update: 2014-09-27
Version: v1.0
Accessibility:
Accessible
Country/Region: India

Classification & Tag

Data type:
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: Indian Institute of Science
Address: Bangalore 560 012,Karnataka,India
City: Bangalore
Province/State: Karnataka
Country/Region: India
Contact name (PI/Team): Narayanaswamy Srinivasan
Contact email (PI/Helpdesk): ns@mbu.iisc.ernet.in

Publications

25262355
NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. [PMID: 25262355]
Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N.

NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural 'intermediately related sequences' is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been 'enriched' with these artificial intermediary sequences. NrichD database currently contains 3,611,010 artificial sequences that have been generated between 27,882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Nucleic Acids Res. 2015:43(Database issue) | 8 Citations (from Europe PMC, 2025-12-13)
24316367
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. [PMID: 24316367]
Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S.

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. Copyright © 2013 Elsevier Ltd. All rights reserved.

J Mol Biol. 2014:426(4) | 10 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
4472/6895 (35.156%)
Gene genome and annotation:
1364/2021 (32.558%)
Phylogeny and homology:
196/302 (35.43%)
Interaction:
828/1194 (30.737%)
4472
Total Rank
18
Citations
1.636
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2015-06-20
Curated by:
Dong Zou [2018-03-07]
Lin Liu [2016-03-28]
Li Yang [2015-06-26]