CompoDynamics: a comprehensive database for characterizing sequence composition dynamics.

Shuai Jiang, Qiang Du, Changrui Feng, Lina Ma, Zhang Zhang
Author Information
  1. Shuai Jiang: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
  2. Qiang Du: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
  3. Changrui Feng: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
  4. Lina Ma: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID
  5. Zhang Zhang: National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. ORCID

Abstract

Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.

References

  1. BMC Bioinformatics. 2012 Apr 26;13:62 [PMID: 22536831]
  2. Elife. 2018 Feb 09;7: [PMID: 29424691]
  3. Bioinformatics. 2014 Sep 1;30(17):2501-2 [PMID: 24825614]
  4. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  5. Biol Direct. 2010 Nov 08;5:63 [PMID: 21059261]
  6. Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334 [PMID: 33290552]
  7. Nucleic Acids Res. 2002 Jun 1;30(11):2599-607 [PMID: 12034849]
  8. Proc Natl Acad Sci U S A. 2008 Nov 18;105(46):17878-83 [PMID: 19001264]
  9. PLoS Genet. 2015 Feb 06;11(2):e1004941 [PMID: 25659072]
  10. Genomics Proteomics Bioinformatics. 2007 Feb;5(1):1-6 [PMID: 17572358]
  11. Genome Biol Evol. 2016 Oct 13;8(10):3083-3089 [PMID: 27609935]
  12. Bioinformatics. 2019 Sep 1;35(17):2949-2956 [PMID: 30649200]
  13. PLoS Biol. 2021 Apr 19;19(4):e3001185 [PMID: 33872297]
  14. BMC Bioinformatics. 2017 Sep 2;18(1):391 [PMID: 28865429]
  15. Res Microbiol. 2007 May;158(4):363-70 [PMID: 17449227]
  16. Prog Mol Biol Transl Sci. 2019;166:1-17 [PMID: 31521229]
  17. Database (Oxford). 2020 Jan 1;2020: [PMID: 32761142]
  18. Nucleic Acids Res. 2000 Jan 1;28(1):292 [PMID: 10592250]
  19. Nucleic Acids Res. 2023 Jan 6;51(D1):D18-D28 [PMID: 36420893]
  20. Biol Direct. 2012 Jan 10;7:2 [PMID: 22230424]
  21. PLoS Genet. 2010 Jun 24;6(6):e1001004 [PMID: 20585573]
  22. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589 [PMID: 34175476]
  23. Nucleic Acids Res. 2020 Jan 8;48(D1):D360-D367 [PMID: 31612960]
  24. Elife. 2019 Dec 19;8: [PMID: 31855182]
  25. Front Microbiol. 2021 Jun 28;12:646300 [PMID: 34262534]
  26. Gene. 1990 Mar 1;87(1):23-9 [PMID: 2110097]
  27. Nucleic Acids Res. 2017 Aug 21;45(14):8484-8492 [PMID: 28582582]
  28. Annu Rev Cell Dev Biol. 2014;30:39-58 [PMID: 25288112]
  29. Methods Mol Biol. 2009;537:207-32 [PMID: 19378146]
  30. Nucleic Acids Res. 2020 Jan 8;48(D1):D320-D327 [PMID: 31906602]
  31. Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419 [PMID: 33125078]
  32. Genome Biol. 2011 Oct 27;12(10):R109 [PMID: 22032172]
  33. Plant Cell Physiol. 2015 Jan;56(1):e11 [PMID: 25435546]
  34. Genome Biol Evol. 2012;4(7):675-82 [PMID: 22628461]
  35. BMC Bioinformatics. 2012 Mar 22;13:43 [PMID: 22435713]
  36. Nucleic Acids Res. 2020 Jan 8;48(D1):D288-D295 [PMID: 31691822]
  37. Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16 [PMID: 28521017]
  38. Nucleic Acids Res. 2020 Nov 4;48(19):11030-11039 [PMID: 33045750]
  39. BMC Bioinformatics. 2012 Sep 08;13:223 [PMID: 22958836]
  40. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  41. Proc Natl Acad Sci U S A. 2016 Oct 11;113(41):E6117-E6125 [PMID: 27671647]
  42. PLoS One. 2012;7(11):e49425 [PMID: 23185330]
  43. Nat Rev Mol Cell Biol. 2021 Feb;22(2):96-118 [PMID: 33353982]
  44. Bioinformatics. 2012 Feb 15;28(4):503-9 [PMID: 22190692]
  45. PLoS Biol. 2006 Jun;4(6):e180 [PMID: 16700628]
  46. Genome Biol Evol. 2015 Apr 09;7(5):1380-9 [PMID: 25861819]

MeSH Term

Amino Acid Sequence
Animals
Apicomplexa
Archaea
Bacteria
Base Composition
Base Sequence
Codon Usage
Databases, Genetic
Fungi
Genetic Code
Genome
Internet
Invertebrates
Open Reading Frames
Phylogeny
Plants
Software
Vertebrates
Viruses

Links to CNCB-NGDC Resources

Database Commons: DBC007431 (CompoDynamics)

Word Cloud

Similar Articles

Cited By