The P10K database: a data portal for the protist 10 000 genomes project.

Xinxin Gao, Kai Chen, Jie Xiong, Dong Zou, Fangdian Yang, Yingke Ma, Chuanqi Jiang, Xiaoxuan Gao, Guangying Wang, Siyu Gu, Peng Zhang, Shuai Luo, Kaiyao Huang, Yiming Bao, Zhang Zhang, Lina Ma, Wei Miao
Author Information
  1. Xinxin Gao: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  2. Kai Chen: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  3. Jie Xiong: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  4. Dong Zou: China National Center for Bioinformation, Beijing 100101, China.
  5. Fangdian Yang: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  6. Yingke Ma: China National Center for Bioinformation, Beijing 100101, China.
  7. Chuanqi Jiang: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  8. Xiaoxuan Gao: Shandong University of Technology, Zibo 255000, China.
  9. Guangying Wang: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  10. Siyu Gu: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  11. Peng Zhang: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  12. Shuai Luo: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China.
  13. Kaiyao Huang: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China. ORCID
  14. Yiming Bao: University of Chinese Academy of Sciences, Beijing 100049, China.
  15. Zhang Zhang: University of Chinese Academy of Sciences, Beijing 100049, China. ORCID
  16. Lina Ma: University of Chinese Academy of Sciences, Beijing 100049, China. ORCID
  17. Wei Miao: Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China. ORCID

Abstract

Protists, a highly diverse group of microscopic eukaryotic organisms distinct from fungi, animals and plants, exert crucial roles within the earth's biosphere. However, the genomes of only a small fraction of known protist species have been published and made publicly accessible. To address this constraint, the Protist 10 000 Genomes Project (P10K) was initiated, implementing a specialized pipeline for single-cell genome/transcriptome assembly, decontamination and annotation of protists. The resultant P10K database (https://ngdc.cncb.ac.cn/p10k/) serves as a comprehensive platform, collating and disseminating genome sequences and annotations from diverse protist groups. Currently, the P10K database has incorporated 2959 genomes and transcriptomes, including 1101 newly sequenced datasets by P10K and 1858 publicly available datasets. Notably, it covers 45% of the protist orders, with a significant representation (53% coverage) of ciliates, featuring nearly a thousand genomes/transcriptomes. Intriguingly, analysis of the unique codon table usage among ciliates has revealed differences compared to the NCBI taxonomy system, suggesting a need to revise the codon tables used for these species. Collectively, the P10K database serves as a valuable repository of genetic resources for protist research and aims to expand its collection by incorporating more sequenced data and advanced analysis tools to benefit protist studies worldwide.

References

  1. Mol Ecol Resour. 2023 Jul;23(5):1182-1193 [PMID: 36912756]
  2. Gigascience. 2018 Mar 1;7(3):1-9 [PMID: 29618049]
  3. Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43 [PMID: 22139910]
  4. Bioinformatics. 2018 Sep 1;34(17):i884-i890 [PMID: 30423086]
  5. Mol Biol Evol. 2021 Sep 27;38(10):4647-4654 [PMID: 34320186]
  6. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):578-583 [PMID: 34400360]
  7. PLoS Biol. 2019 Jun 3;17(6):e3000294 [PMID: 31158217]
  8. Genome Biol. 2008 Jan 11;9(1):R7 [PMID: 18190707]
  9. Nucleic Acids Res. 2013 Jan;41(Database issue):D597-604 [PMID: 23193267]
  10. Mol Biol Evol. 2023 Apr 4;40(4): [PMID: 36952281]
  11. Microorganisms. 2020 Nov 22;8(11): [PMID: 33266460]
  12. Proc Natl Acad Sci U S A. 2023 May 30;120(22):e2221683120 [PMID: 37216548]
  13. Gigascience. 2020 Aug 1;9(8): [PMID: 32810278]
  14. Bioinformatics. 2008 Mar 1;24(5):637-44 [PMID: 18218656]
  15. Nat Biotechnol. 2017 Nov;35(11):1026-1028 [PMID: 29035372]
  16. Gigascience. 2012 Dec 27;1(1):18 [PMID: 23587118]
  17. Nat Biotechnol. 2011 May 15;29(7):644-52 [PMID: 21572440]
  18. Sci Rep. 2016 Apr 29;6:24874 [PMID: 27126745]
  19. Bioinformatics. 2015 May 15;31(10):1674-6 [PMID: 25609793]
  20. Nucleic Acids Res. 2023 Jan 6;51(D1):D18-D28 [PMID: 36420893]
  21. Cell. 2016 Jul 28;166(3):691-702 [PMID: 27426948]
  22. Science. 1969 Jan 10;163(3863):150-60 [PMID: 5762760]
  23. Bioinformatics. 2013 Oct 1;29(19):2487-9 [PMID: 23842809]
  24. Bioinformatics. 2014 May 1;30(9):1236-40 [PMID: 24451626]
  25. Nucleic Acids Res. 2019 Jan 8;47(D1):D637-D648 [PMID: 30365027]
  26. Bioinformatics. 2011 Jul 15;27(14):1929-33 [PMID: 21653513]
  27. Genomics. 1997 Nov 15;46(1):37-45 [PMID: 9403056]
  28. Genomics Proteomics Bioinformatics. 2021 Aug;19(4):584-589 [PMID: 34175476]
  29. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66 [PMID: 14500829]
  30. Nat Methods. 2021 Apr;18(4):366-368 [PMID: 33828273]
  31. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6 [PMID: 23193283]
  32. Adv Appl Microbiol. 2018;102:37-81 [PMID: 29680126]
  33. PLoS Biol. 2014 Jun 24;12(6):e1001889 [PMID: 24959919]
  34. Genome Res. 2017 May;27(5):722-736 [PMID: 28298431]
  35. Innovation (Camb). 2020 Nov 07;1(3):100058 [PMID: 34557722]
  36. Sci Rep. 2015 Oct 21;5:15470 [PMID: 26486372]
  37. mSystems. 2018 Apr 10;3(3): [PMID: 29657969]
  38. Bioinformatics. 2023 Jan 1;39(1): [PMID: 36511586]
  39. Nucleic Acids Res. 2022 Jan 7;50(D1):D898-D911 [PMID: 34718728]
  40. Natl Sci Rev. 2019 Jul;6(4):810-824 [PMID: 31598383]
  41. Database (Oxford). 2012 Mar 20;2012:bas007 [PMID: 22434841]
  42. Nature. 2015 Jun 4;522(7554):34 [PMID: 26040883]
  43. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  44. Bioinformatics. 2004 Nov 1;20(16):2878-9 [PMID: 15145805]
  45. BMC Bioinformatics. 2009 Dec 15;10:421 [PMID: 20003500]
  46. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  47. J Hered. 2013 Sep-Oct;104(5):595-600 [PMID: 23940263]
  48. BMC Bioinformatics. 2004 May 14;5:59 [PMID: 15144565]

Grants

  1. 2020YFA0907400/National Key R&D Program of China
  2. XDPB18/Strategic Priority Research Program of the Chinese Academy of Sciences
  3. 2019104/Youth Innovation Promotion Association of Chinese Academy of Sciences
  4. 153F11KYSB20160008/International Partnership Program of the Chinese Academy of Sciences
  5. 32122015/Natural Science Foundation of China
  6. /Open Biodiversity and Health Big Data Programme of IUBS
  7. /Ministry of Science and Technology of the People's Republic of China

MeSH Term

Animals
Codon
Databases, Genetic
Eukaryota
Fungi
Plants
Genome

Chemicals

Codon

Word Cloud

Similar Articles

Cited By