CompoDynamics: a comprehensive database for characterizing sequence composition dynamics.

Advanced Search

Shuai Jiang, Qiang Du, Changrui Feng, Lina Ma, Zhang Zhang

Author Information

Shuai Jiang: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
Qiang Du: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
Changrui Feng: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
Lina Ma: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
Zhang Zhang: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID

PMID: 34718745 DOI: 10.1093/nar/gkab979

Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.

BMC Bioinformatics. 2012 Apr 26;13:62 [PMID: 22536831]

Elife. 2018 Feb 09;7: [PMID: 29424691]

Bioinformatics. 2014 Sep 1;30(17):2501-2 [PMID: 24825614]

Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]

Biol Direct. 2010 Nov 08;5:63 [PMID: 21059261]

Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334 [PMID: 33290552]

Nucleic Acids Res. 2002 Jun 1;30(11):2599-607 [PMID: 12034849]

Proc Natl Acad Sci U S A. 2008 Nov 18;105(46):17878-83 [PMID: 19001264]

PLoS Genet. 2015 Feb 06;11(2):e1004941 [PMID: 25659072]

Genomics Proteomics Bioinformatics. 2007 Feb;5(1):1-6 [PMID: 17572358]

Genome Biol Evol. 2016 Oct 13;8(10):3083-3089 [PMID: 27609935]

Bioinformatics. 2019 Sep 1;35(17):2949-2956 [PMID: 30649200]

PLoS Biol. 2021 Apr 19;19(4):e3001185 [PMID: 33872297]

BMC Bioinformatics. 2017 Sep 2;18(1):391 [PMID: 28865429]

Res Microbiol. 2007 May;158(4):363-70 [PMID: 17449227]

Prog Mol Biol Transl Sci. 2019;166:1-17 [PMID: 31521229]

Database (Oxford). 2020 Jan 1;2020: [PMID: 32761142]

Nucleic Acids Res. 2000 Jan 1;28(1):292 [PMID: 10592250]

Nucleic Acids Res. 2023 Jan 6;51(D1):D18-D28 [PMID: 36420893]

Biol Direct. 2012 Jan 10;7:2 [PMID: 22230424]

PLoS Genet. 2010 Jun 24;6(6):e1001004 [PMID: 20585573]

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589 [PMID: 34175476]

Nucleic Acids Res. 2020 Jan 8;48(D1):D360-D367 [PMID: 31612960]

Elife. 2019 Dec 19;8: [PMID: 31855182]

Front Microbiol. 2021 Jun 28;12:646300 [PMID: 34262534]

Gene. 1990 Mar 1;87(1):23-9 [PMID: 2110097]

Nucleic Acids Res. 2017 Aug 21;45(14):8484-8492 [PMID: 28582582]

Annu Rev Cell Dev Biol. 2014;30:39-58 [PMID: 25288112]

Methods Mol Biol. 2009;537:207-32 [PMID: 19378146]

Nucleic Acids Res. 2020 Jan 8;48(D1):D320-D327 [PMID: 31906602]

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419 [PMID: 33125078]

Genome Biol. 2011 Oct 27;12(10):R109 [PMID: 22032172]

Plant Cell Physiol. 2015 Jan;56(1):e11 [PMID: 25435546]

Genome Biol Evol. 2012;4(7):675-82 [PMID: 22628461]

BMC Bioinformatics. 2012 Mar 22;13:43 [PMID: 22435713]

Nucleic Acids Res. 2020 Jan 8;48(D1):D288-D295 [PMID: 31691822]

Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16 [PMID: 28521017]

Nucleic Acids Res. 2020 Nov 4;48(19):11030-11039 [PMID: 33045750]

BMC Bioinformatics. 2012 Sep 08;13:223 [PMID: 22958836]

Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]

Proc Natl Acad Sci U S A. 2016 Oct 11;113(41):E6117-E6125 [PMID: 27671647]

PLoS One. 2012;7(11):e49425 [PMID: 23185330]

Nat Rev Mol Cell Biol. 2021 Feb;22(2):96-118 [PMID: 33353982]

Bioinformatics. 2012 Feb 15;28(4):503-9 [PMID: 22190692]

PLoS Biol. 2006 Jun;4(6):e180 [PMID: 16700628]

Genome Biol Evol. 2015 Apr 09;7(5):1380-9 [PMID: 25861819]

Amino Acid Sequence

Animals

Apicomplexa

Archaea

Bacteria

Base Composition

Base Sequence

Codon Usage

Databases, Genetic

Fungi

Genetic Code

Genome

Internet

Invertebrates

Open Reading Frames

Phylogeny

Plants

Software

Vertebrates

Viruses

Journal Article Research Support, Non-U.S. Gov't

Database Commons: DBC007431 (CompoDynamics)

OpenLB
Open Library of Bioscience