CompoDynamics: a Comprehensive Database for Characterizing Sequence Composition Dynamics

NGDC  Jan 29, 2022

Recently, researchers from the Beijing Institute of Genomics of Chinese Academy of Sciences / China National Center for Bioinformation have constructed a comprehensive database of sequence compositions —— CompoDynamics. This work was published with the title "CompoDynamics: a comprehensive database for characterizing sequence composition dynamics" in the journal Nucleic Acids Research

Sequence compositions as well as the derived features are critically essential for better understanding evolutionary processes and molecular mechanisms across all kingdoms of life. Existing resources, albeit have made great efforts, they focus on a quite limited range of sequence compositions or derived features, and they do not offer available tools for comparing molecular compositions between gene groupings in terms of protein families and GO terms. Particularly, an ever-increasing number of high-quality genomes covering a broad diversity of species have been sequenced and well annotated these years. Therefore, it is in urgent need to build a database incorporating sequence compositions and features based on high-quality genomes and genes. 

To fill this gap, we present CompoDynamics, a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. The current version presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes, covering 1 692 647 genes and 24 995 species. Moreover, CompoDynamics provides interactive and user-friendly tools to perform comparative analysis of composition features across different species and gene groupings in terms of protein families and GO terms, enabling users to investigate composition dynamics across genes and genomes. Collectively, it bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies. 

This work was supported by Special Investigation on Science and Technology Basic Resources of the MOST, CAS Strategic Priority Research Program, National Natural Science Foundation of China, and Youth Innovation Promotion Association of CAS. 

Database contents and organization (Image by ZHANG Zhang’s group)

Dr. Ma Lina
Dr. ZHANG Zhang