Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

BIOZON

General information

URL: http://biozon.org/
Full name:
Description: Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32 000 protein structures, 150 000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND.
Year founded: 2006
Last update:
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Cornell University
Address: Ithaca, NY, USA
City: Ithaca
Province/State: NY
Country/Region: United States
Contact name (PI/Team): Golan Yona
Contact email (PI/Helpdesk): golan@cs.cornell.edu

Publications

16381854
BIOZON: a hub of heterogeneous biological data. [PMID: 16381854]
Birkland A, Yona G.

Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein-protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32,000 protein structures, 150,000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at Biozon.org.

Nucleic Acids Res. 2006:34(Database issue) | 17 Citations (from Europe PMC, 2025-12-13)
16480510
BIOZON: a system for unification, management and analysis of heterogeneous biological data. [PMID: 16480510]
Birkland A, Yona G.

BACKGROUND: Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability.
DESCRIPTION: Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated.
CONCLUSION: The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org.

BMC Bioinformatics. 2006:7() | 41 Citations (from Europe PMC, 2025-12-13)
16480496
Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities. [PMID: 16480496]
Shafer P, Isganitis T, Yona G.

BACKGROUND: Existing biological databases support a variety of queries such as keyword or definition search. However, they do not provide any measure of relevance for the instances reported, and result sets are usually sorted arbitrarily.
RESULTS: We describe a system that builds upon the complex infrastructure of the Biozon database and applies methods similar to those of Google to rank documents that match queries. We explore different prominence models and study the spectral properties of the corresponding data graphs. We evaluate the information content of principal and non-principal eigenspaces, and test various scoring functions which combine contributions from multiple eigenspaces. We also test the effect of similarity data and other variations which are unique to the biological knowledge domain on the quality of the results. Query result sets are assessed using a probabilistic approach that measures the significance of coherence between directly connected nodes in the data graph. This model allows us, for the first time, to compare different prominence models quantitatively and effectively and to observe unique trends.
CONCLUSION: Our tests show that the ranked query results outperform unsorted results with respect to our significance measure and the top ranked entities are typically linked to many other biological entities. Our study resulted in a working ranking system of biological entities that was integrated into Biozon at http://biozon.org.

BMC Bioinformatics. 2006:7() | 4 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
3181/6895 (53.88%)
Gene genome and annotation:
993/2021 (50.915%)
3181
Total Rank
61
Citations
3.211
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2015-07-21
Curated by:
Lina Ma [2018-06-12]
Dong Zou [2018-02-07]
Mengwei Li [2016-03-31]
Mengwei Li [2015-11-29]
Chunlei Yu [2015-07-21]