Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

ProtoNet

General information

URL: http://www.protonet.cs.huji.ac.il/
Full name:
Description: ProtoNet provides automatic hierarchical classification of protein sequences.
Year founded: 2003
Last update: 2012-11-01
Version: v6.1
Accessibility:
Accessible
Country/Region: Israel

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Hebrew University of Jerusalem
Address: 91904, Israel
City: Jerusalem
Province/State:
Country/Region: Israel
Contact name (PI/Team): Michal Linial
Contact email (PI/Helpdesk): michall@cc.huji.ac.il

Publications

23563419
ProtoNet: charting the expanding universe of protein sequences. [PMID: 23563419]
Rappoport N, Linial N, Linial M.
Nat Biotechnol. 2013:31(4) | 10 Citations (from Europe PMC, 2025-12-13)
22121228
ProtoNet 6.0: organizing 10 million protein sequences in a compact hierarchical family tree. [PMID: 22121228]
Rappoport N, Karsenty S, Stern A, Linial N, Linial M.

ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottom-up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9 million proteins of which 5.5% come from UniProtKB/SwissProt and the rest from UniProtKB/TrEMBL. The hierarchical tree structure is based on an all-against-all comparison of 2.5 million representatives of UniRef50. Rigorous annotation-based quality tests prune the tree to most informative 162,088 clusters. Every high-quality cluster is assigned a ProtoName that reflects the most significant annotations of its proteins. These annotations are dominated by GO terms, UniProt/Swiss-Prot keywords and InterPro. ProtoNet 6.0 operates in a default mode. When used in the advanced mode, this data structure offers the user a view of the family tree at any desired level of resolution. Systematic comparisons with previous versions of ProtoNet are carried out. They show how our view of protein families evolves, as larger parts of the sequence space become known. ProtoNet 6.0 provides numerous tools to navigate the hierarchy of clusters.

Nucleic Acids Res. 2012:40(Database issue) | 31 Citations (from Europe PMC, 2025-12-13)
15539447
Predicting fold novelty based on ProtoNet hierarchical classification. [PMID: 15539447]
Kifer I, Sasson O, Linial M.

Structural genomics projects aim to solve a large number of protein structures with the ultimate objective of representing the entire protein space. The computational challenge is to identify and prioritize a small set of proteins with new, currently unknown, superfamilies or folds. We develop a method that assigns each protein a likelihood of it belonging to a new, yet undetermined, structural superfamily. The method relies on a variant of ProtoNet, an automatic hierarchical classification scheme of all protein sequences from SwissProt. Our results show that proteins that are remote from solved structures in the ProtoNet hierarchy are more likely to belong to new superfamilies. The results are validated against SCOP releases from recent years that account for about half of the solved structures known to date. We show that our new method and the representation of ProtoNet are superior in detecting new targets, compared to our previous method using ProtoMap classification. Furthermore, our method outperforms PSI-BLAST search in detecting potential new superfamilies.

Bioinformatics. 2005:21(7) | 7 Citations (from Europe PMC, 2025-12-13)
15608180
ProtoNet 4.0: a hierarchical classification of one million protein sequences. [PMID: 15608180]
Kaplan N, Sasson O, Inbar U, Friedlich M, Fromer M, Fleischer H, Portugaly E, Linial N, Linial M.

ProtoNet is an automatic hierarchical classification of the protein sequence space. In 2004, the ProtoNet (version 4.0) presents the analysis of over one million proteins merged from SwissProt and TrEMBL databases. In addition to rich visualization and analysis tools to navigate the clustering hierarchy, we incorporated several improvements that allow a simplified view of the scaffold of the proteins. An unsupervised, biologically valid method that was developed resulted in a condensation of the ProtoNet hierarchy to only 12% of the clusters. A large portion of these clusters was automatically assigned high confidence biological names according to their correspondence with functional annotations. ProtoNet is available at: http://www.protonet.cs.huji.ac.il.

Nucleic Acids Res. 2005:33(Database issue) | 39 Citations (from Europe PMC, 2025-12-13)
12520020
ProtoNet: hierarchical classification of the protein space. [PMID: 12520020]
Sasson O, Vaaknin A, Fleischer H, Portugaly E, Bilu Y, Linial N, Linial M.

The ProtoNet site provides an automatic hierarchical clustering of the SWISS-PROT protein database. The clustering is based on an all-against-all BLAST similarity search. The similarities' E-score is used to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. ProtoNet (version 1.3) is accessible in the form of an interactive web site at http://www.protonet.cs.huji.ac.il. ProtoNet provides navigation tools for monitoring the clustering process with a vertical and horizontal view. Each cluster at any level of the hierarchy is assigned with a statistical index, indicating the level of purity based on biological keywords such as those provided by SWISS-PROT and InterPro. ProtoNet can be used for function prediction, for defining superfamilies and subfamilies and for large-scale protein annotation purposes.

Nucleic Acids Res. 2003:31(1) | 46 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
2155/6895 (68.76%)
Raw bio-data:
146/582 (75.086%)
2155
Total Rank
129
Citations
5.864
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2015-06-20
Curated by:
Lin Liu [2022-08-22]
Shixiang Sun [2016-03-28]
Shixiang Sun [2015-11-22]
Shixiang Sun [2015-06-28]
Shixiang Sun [2015-06-26]