Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

MDB

General information

URL: http://csc.columbusstate.edu/carroll/MDB
Full name: MultiDomainBenchmark
Description: Domains are the primary building blocks of protein structure and function. MultiDomainBenchmark was designed to provide robust evaluation of genetic database searching with query sequences that have multi-domains.
Year founded: 2019
Last update:
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
Keywords:

Contact information

University/Institution: Columbus State University
Address: TSYS School of Computer Science, Columbus State University, 4225 University Avenue, Columbus, 31907, GA, USA
City:
Province/State:
Country/Region: United States
Contact name (PI/Team): Hyrum D. Carroll
Contact email (PI/Helpdesk): carroll_hyrum@columbusstate.edu

Publications

30764761
MultiDomainBenchmark: a multi-domain query and subject database suite. [PMID: 30764761]
Hyrum D Carroll, John L Spouge, Mileidy Gonzalez

BACKGROUND: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools.
DESCRIPTION: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture.
CONCLUSION: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .

BMC Bioinformatics. 2019:20(1) | 0 Citations (from Europe PMC, 2025-12-13)

Ranking

0
Citations
0
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2019-09-24
Curated by:
Ghulam Abbas [2019-10-08]
furrukh mehmood [2019-09-24]