Accurate identification of plants remains a significant challenge for taxonomists and is the basis for plant diversity conservation. Although DNA barcoding methods are commonly used for plant identification, these are limited by the low amplification success and low discriminative power of selected genomic regions. In this study, we developed a k-mer-based approach, the DNA signature sequence (DSS), to accurately identify plant taxon-specific markers, especially at the species level. DSS is a constant-length nucleotide sequence capable of identifying a taxon and distinguishing it from other taxa. In this study, we performed the first large-scale study of DSS markers in plants. DSS candidates of 3899 angiosperm plant species were calculated based on a chloroplast data set with 4356 assemblies. Using Sanger sequencing of PCR amplicons and high-throughput sequencing, DSSs were validated in four and 165 species, respectively. Based on this, the universality of the DSSs was over 79.38%. Several indicators influencing DSS marker identification and detection have also been evaluated, and common criteria for DSS application in plant identification have been proposed.
Baeg, I.-H., & So, S.-H. (2013). The world ginseng market and the ginseng (Korea). Journal of Ginseng Research, 37(1), 1-7.
Bánki, O., Roskov, Y., Vandepitte, L., DeWalt, R. E., Remsen, D., Schalk, P., Orrell, T., Keping, M., Miller, J., Aalbu, R., Adlard, R., Adriaenssens, E., Aedo, C., Aescht, E., Akkari, N., Alonso-Zarazaga, M. A., Alvarez, B., Alvarez, F., Anderson, G., … von Konrat, M. (2021). Catalogue of life checklist (annual checklist 2021). Catalogue of Life. https://doi.org/10.48580/d4sb
CBOL Plant Working Group. (2009). A DNA barcode for land plants. Proceedings of the National Academy of Sciences, 106(31), 12794-12797.
Chase, M. W., Christenhusz, M. J. M., Fay, M. F., Byng, J. W., Judd, W. S., Soltis, D. E., Mabberley, D. J., Sennikov, A. N., Soltis, P. S., & Stevens, P. F. (2016). An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society, 181(1), 1-20.
Chen, M., Ma, Y., Wu, S., Zheng, X., Kang, H., Sang, J., Xu, X., Hao, L., Li, Z., Gong, Z., Xiao, J., Zhang, Z., Zhao, W., & Bao, Y. (2021). Genome warehouse: A public repository housing genome-scale data. Genomics, Proteomics & Bioinformatics., 19, 584-589.
Christenhusz, M. J. M., & Byng, J. W. (2016). The number of known plants species in the world and its annual increase. Phytotaxa, 261(3), 201-217.
CNCB-NGDC Members and Partners. (2022). Database resources of the National Genomics Data Center, China National Center for bioinformation in 2022. Nucleic Acids Research, 50(D1), D27-D38.
de Boer, H. J., Ghorbani, A., Manzanilla, V., Raclariu, A. C., Kreziou, A., Ounjai, S., Osathanunkul, M., & Gravendeel, B. (2017). DNA metabarcoding of orchid-derived products reveals widespread illegal orchid trade. Proceedings of the Royal Society B: Biological Sciences, 284(1863), 20171182.
Ebert, A. W., & Engels, J. M. M. (2020). Plant biodiversity and genetic resources matter! Plants, 9(12), 1706.
Goetze, M., Zanella, C. M., Palma-Silva, C., Büttow, M. V., & Bered, F. (2017). Incomplete lineage sorting and hybridization in the evolutionary history of closely related, endemic yellow-flowered species of subgenus (Bromeliaceae). American Journal of Botany, 104(7), 1073-1087.
Guo, C., Ma, P. F., Yang, G. Q., Ye, X. Y., Guo, Y., Liu, J. X., Liu, Y. L., Eaton, D. A. R., Guo, Z. H., & Li, D. Z. (2020). Parallel ddRAD and genome skimming analyses reveal a radiative and reticulate evolutionary history of the temperate bamboos. Systematic Biology, 70(4), 756-773.
Hebert, P. D., Cywinska, A., Ball, S. L., & deWaard, J. R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences, 270(1512), 313-321.
Hollingsworth, P. M., Li, D. Z., van der Bank, M, & Twyford, A. D. (2016). Telling plant species apart with DNA: From barcodes to genomes. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1702), 20150338.
Ji, Y., Liu, C., Yang, Z., Yang, L., He, Z., Wang, H., Yang, J., & Yi, T. (2019). Testing and using complete plastomes and ribosomal DNA sequences as the next generation DNA barcodes in panax (Araliaceae). Molecular Ecology Resources, 19(5), 1333-1345.
Kaplinski, L., Lepamets, M., & Remm, M. (2015). GenomeTester4: A toolkit for performing basic set operations-union, intersection and complement on k-mer lists. Gigascience, 4(1), s13742-015-0097-y.
Kress, W. J., & Erickson, D. L. (2008). DNA barcodes: Genes, genomics, and bioinformatics. Proceedings of the National Academy of Sciences, 105(8), 2761-2762. https://doi.org/10.1073/pnas.0800476105
Li, D. Z., Gao, L. M., Li, H. T., Wang, H., Ge, X. J., Liu, J. Q., Chen, Z. D., Zhou, S. L., Chen, S. L., Yang, J. B., Fu, C. X., Zeng, C. X., Yan, H. F., Zhu, Y. J., Sun, Y. S., Chen, S. Y., Zhao, L., Wang, K., … Duan, G. W. (2011). Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences of the United States of America, 108(49), 19641-19646.
Liu, J., Yan, H.-F., & Ge, X.-J. (2016). The use of DNA barcoding on recently diverged species in the genus Gentiana (Gentianaceae) in China. PLoS One, 11(4), e0153008.
Lo, Y. T., & Shaw, P. C. (2018). DNA-based techniques for authentication of processed food and food supplements. Food Chemistry, 240, 767-774.
Manzanilla, V., Teixidor-Toneu, I., Martin, G. J., Hollingsworth, P. M., de Boer, H. J., & Kool, A. (2022). Using target capture to address conservation challenges: Population-level tracking of a globally-traded herbal medicine. Molecular Ecology Resources, 22(1), 212-224.
Martin, A. C., & Harvey, W. J. (2017). The global pollen project: A new tool for pollen identification and the dissemination of physical reference collections. Methods in Ecology and Evolution, 8(7), 892-897.
Myers, G. (2020). FastK: A K-mer counter (for HQ assembly data sets). Github. https://github.com/thegenemyers/FASTK
Nguyen, V. B., Park, H. S., Lee, S. C., Lee, J., Park, J. Y., & Yang, T. J. (2017). Authentication markers for five major panax species developed via comparative analysis of complete chloroplast genome sequences. Journal of Agricultural and Food Chemistry, 65(30), 6298-6306.
Padial, J. M., Miralles, A., De la Riva, I., & Vences, M. (2010). The integrative future of taxonomy. Frontiers in Zoology, 7(1), 16.
Paula, D. P., Linard, B., Andow, D. A., Sujii, E. R., Pires, C. S. S., & Vogler, A. P. (2015). Detection and decay rates of prey and prey symbionts in the gut of a predator through metagenomics. Molecular Ecology Resources, 15(4), 880-892.
Piñol, J., Senar, M. A., & Symondson, W. O. C. (2019). The choice of universal primers and the characteristics of the species mixture determine when DNA metabarcoding can be quantitative. Molecular Ecology, 28(2), 407-419.
Raime, K., Krjutskov, K., & Remm, M. (2020). Method for the identification of plant DNA in food using alignment-free analysis of sequencing reads: A case study on lupin. Frontiers in Plant Science, 11, 646.
Raime, K., & Remm, M. (2018). Method for the identification of taxon-specific k-mers from chloroplast genome: A case study on tomato plant (Solanum lycopersicum). Frontiers in Plant Science, 9, 6.
Ratnasingham, S., & Hebert, P. D. N. (2007). BOLD: The barcode of life data system (http://www.barcodinglife.org). Molecular Ecology Notes, 7(3), 355-364.
Sakaridis, I., Ganopoulos, I., Argiriou, A., & Tsaftaris, A. (2013). A fast and accurate method for controlling the correct labeling of products containing buffalo meat using high resolution melting (HRM) analysis. Meat Science, 94(1), 84-88.
Senizza, B., Rocchetti, G., Ghisoni, S., Busconi, M., De Los Mozos Pascual, M., Fernandez, J. A., Lucini, L., & Trevisan, M. (2019). Identification of phenolic markers for saffron authenticity and origin: An untargeted metabolomics approach. Food Research International (Ottawa, Ont.), 126, 108584.
Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C., & Willerslev, E. (2012). Towards next-generation biodiversity assessment using DNA metabarcoding. Molecular Ecology, 21(8), 2045-2050.
Tu, Q., He, Z., Deng, Y., & Zhou, J. (2013). Strain/species-specific probe design for microbial identification microarrays. Applied and Environmental Microbiology, 79(16), 5085-5088.
Tu, Q., He, Z., & Zhou, J. (2014). Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Research, 42(8), e67.
Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3-new capabilities and interfaces. Nucleic Acids Research, 40(15), e115.
Wäldchen, J., Rzanny, M., Seeland, M., & Mäder, P. (2018). Automated plant species identification-trends and future directions. PLoS Computational Biology, 14(4), e1005993.
Wang, Y., Wang, S., Liu, Y., Yuan, Q., Sun, J., & Guo, L. (2021). Chloroplast genome variation and phylogenetic relationships of Atractylodes species. BMC Genomics, 22(1), 103.
Zhang, N., Ma, Y., Folk, R. A., Yu, J., Pan, Y., & Gong, X. (2018). Maintenance of species boundaries in three sympatric Ligularia (Senecioneae, Asteraceae) species. Journal of Integrative Plant Biology, 60(10), 986-999.
Zhang, X.-L., Wang, Y.-J., Ge, X.-J., Yuan, Y.-M., Yang, H.-L., & Liu, J.-Q. (2009). Molecular phylogeny and biogeography of Gentiana sect. Cruciata (Gentianaceae) based on four chloroplast DNA datasets. Taxon, 58(3), 862-870.
Grants
2060302/Key project at central government level for the ability establishment of sustainable use for valuable Chinese medicine resources
CI2021B014/CI2021A041/Scientific and Technological Innovation Project of China Academy of Chinese Medical Sciences
2018FY10080002/Special Funds for Basic Resources Investigation Research of the Ministry of Science and Technology
2019YFC1711000/The National Key Research and Development Program of China