GCD


15888677	Genome cluster database. A sequence family analysis platform for Arabidopsis and rice. [PMID: 15888677] Horan K, Lauricha J, Bailey-Serres J, Raikhel N, Girke T. Abstract The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species. Plant Physiol. 2005:138(1) \| 28 Citations (from Europe PMC, 2026-04-04)

Genome cluster database. A sequence family analysis platform for Arabidopsis and rice. [PMID: 15888677]

Horan K, Lauricha J, Bailey-Serres J, Raikhel N, Girke T.

The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

Plant Physiol. 2005:138(1) | 28 Citations (from Europe PMC, 2026-04-04)

URL:	http://bioinfo.ucr.edu/projects/GCD
Full name:	Genome Cluster Database
Description:	The Genome Cluster Database (GCD) is an integrated mining tool for the genome-wide family and singlet proteins from Arabidopsis thaliana and Oryza sativa spp. japonica. Their proteomes have been clustered here into families by employing two independent approaches. The program BLASTCLUST was used for similarity-based clustering (BCL) and hmmpfam searches were used for domain-based clustering (HCL).
Year founded:	2005
Last update:	2012-10-08
Version:	Version 3
Accessibility:	Accessible
Country/Region:	United States

Data type:	Protein
Data object:	Plant
Database category:	Expression
Major species:	Arabidopsis thaliana Oryza sativa
Keywords:	singlet proteins proteomes clustering

University/Institution:	University of California Riverside
Address:	Center for Plant Cell Biology, Department of Botany and Plant Sciences, University of California, Riverside, California 92521, USA.
City:	Riverside
Province/State:	California
Country/Region:	United States
Contact name (PI/Team):	Thomas Girke
Contact email (PI/Helpdesk):	thomas.girke@ucr.edu

Database Commons
a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons a catalog of worldwide biological databases