Chloroplast Genome Information Resource
Totally, 34,923 chloroplast genome assemblies covering 16,717 species (648 families, 4,116 genera) are deposited in CGIR. 9,785 assemblies covering 6,628 species were downloaded from NCBI, 16 assemblies covering 16 species were downloaded from NGDC Genome Warehouse and 1,170 assemblies covering 718 species were sequenced by National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences.
CGIR integrates 3,970,133 gene records, 13,255,422 simple sequence repeats (SSRs), 849,892 DNA barcodes and 31,530,032 DNA signature sequences (DSSs) of chloroplast genome. The taxonomic classification of each chloroplast genomes (including families, genera, and species) and gene locus name were standardized. SSRs and DNA barcodes were systematically analyzed. DSSs for species identification were investigated in 1,849 seed plants with more than two chloroplast genomes in CGIR.
The taxonomic information were standardized based on Species 2000. Briefly, we curated a taxonomic name if its taxonomic status in species 2000 is “synonym”. Species name not recorded in species 2000 were curated based on NCBI Taxonomy Database, and other references detailed below.
Plant Groups | Reference Database |
---|---|
Angiosperm | Angiosperm Phylogeny Group (APG IV) |
Gymnosperm | The Plant List (http://www.theplantlist.org) |
Bryophytes | The Plant List (http://www.theplantlist.org) |
Pteridophyte | Pteridophyte Phylogeny Group (PPG I) |
Phycophyta | AlgaeBase (https://www.algaebase.org) |
Featured plants in CGIR were curated based on World Checklist of Useful Plant Species (2020) and divided into 6 categories, including environmental, food, forage, material, medicine, poison. The category of featured plants is detailed below.
Category | Description |
---|---|
environmental | Examples include intercrops and nurse crops, ornamentals, barrier hedges, shade plants, windbreaks, soil improvers, plants for revegetation and erosion control, wastewater purifiers, indicators of the presence of metals, pollution, or underground water. |
food | Food, including beverages, for humans only. |
forage | Forage and fodder for vertebrate animals only. |
material | Woods, fibres, cork, cane, tannins, latex, resins, gums, waxes, oils, lipids, etc. and their derived products, including charcoal, petroleum substitutes, fuel alcohols |
medicine | Both human and veterinary. |
poison | Plants which are poisonous to vertebrates and invertebrates, both accidentally and usefully, e.g., for hunting and fishing. |
For medicine recorded in Chinese Pharmacopoeia and National Compilation of Chinese Herbal Medicines, their medicinal organs were also curated and the curation model for medicinal organ is listed below.
Medicinal organ | Plant tissue |
---|---|
Radix | Root |
Rhizoma | Subterraneous stem (including rhizome, tuber, bulb, corm, etc.) |
Caulis | Rattan stem |
Lignum | Phloem |
Folium | Xylem |
Flos | Flower |
Fructus | Fruit |
Semen | Seed |
Herba | Whole herb |
Resina | Resin |
Others | Not in the above categories. (e.g. algae) |
SSRs could be divided into three types (perfect, imperfect, and compound SSRs). Perfect and compound SSRs were identified using microsatellite identification tool (MISA); Imperfect SSRs were identified by IMEx. All primers were designed by Primer 3.
DNA Barcodes were identified based on an in-silico approach. For each DNA barcoding region, the selected forward and reverse primers were aligned to the chloroplast genomes using BLAST. Based on the alignment position, the nucleotides between the aligned primers were considered as DNA barcodes.
The primers used in DNA barcodes identification is listed below.
Barcode region | Forward Primer | Reverse Primer | Reference | ||
---|---|---|---|---|---|
Name | Sequence | Name | Sequence | ||
atpI-atpH | atpI | TATTTACAAGYGGTATTCAAGCT | atpH | CCAAYCCAGCAGCAATAAC | Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am J Bot. 2007;94(3):275-288. |
ndhF-rpl32 | ndhF | GAAAGGTATKATCCAYGMATATT | rpl32 | CCAATATCCCTTYYTTTTCCAA | |
ndhJ-trnL | ndhJ | ATGCCYGAAAGTTGGATAGG | trnL' | GGTTCAAGTCCCTCTATCCC | |
petL-psbE | petL | AGTAGAAAACCGAAATAACTAGTTA | psbE | TATCGAATACTGGTAATAATATCAGC | |
psaI-accD | psaI | AATYGTACCACGTAATCYTTTAAA | accD | AGAAGCCATTGCAATTGCCGGAAA | |
psbB-psbH | psbB | TCCAAAAANKKGGAGATCCAAC | psbF | TCAAYRGTYTGTGTAGCCAT | |
psbD-trnT | psbD | CTCCGTARCCAGTCATCCATA | trnT | CCCTTTTAACTCAGTGGTAG | |
psbJ-petA | psbJ | ATAGGTACTGTARCYGGTATT | petA | AACARTTYGARAAGGTTCAATT | |
rpl14-rpl36 | rpl14 | AAGGAAATCCAAAAGGAACTCG | rpl36 | GGRTTGGAACAAATTACTATAATTCG | |
rpl32-trnL | rpl32 | CAGTTCCAAAAAAACGTACTTC | trnL | CTGCTTCCTAAGAGCAGCGT | |
rps12-rpl20 | rps12 | ATTAGAAANRCAAGACAGCCAAT | rpl20 | CGYYAYCGAGCTATATATCC | |
rps16 | rps16F | AAACGATGTGGTARAAAGCAAC | rps16R | AACATCWATTGCAASGATTCGATA | |
rps16-trnK | rpS16x2F2 | AAAGTGGGTTTTTATGATCC | trnK | TTAAAAGCCGAGTACTCTACC | |
trnC-trnD | trnC | CCAGTTCRAATCYGGGTG | trnD | GGGATTGTAGYTCAATTGGT | |
trnD-trnT | trnD | ACCAATTGAACTACAATCCC | trnT | CTACCACTGAGTTAAAAGGG | |
trnL-trnF | trnL-c | CGAAATCGGTAGACGCTACG | trnL-f | ATTTGAACTGGTGACACGAG | |
trnV-ndhC | trnV | GTCTACGGTTCGARTCCGTA | ndhC | TATTATTAGAAATGYCCARAAAATATCATATTC | |
atpF-atpH | atpF | ACTCGCACACACTCCCTTTCC | atpH | GCTTTTATGGAAGCTTTAACAAT | CBOL Plant Working Group. A DNA barcode for land plants. Proc Natl Acad Sci U S A. 2009;106(31):12794-12797. |
matK | matK 3F | CGTACAGTACTTTTGTGTTTACGAG | matK 1R | ACCCAGTCCATCTGGAAATCTTGGTTC | |
rbcL | rbcL 1F | ATGTCACCACAAACAGAAAC | rbcL 724R | TCGCATGTACCTGCAGTAGC | |
trnH-psbA | trnH2 | CGCGCATGGTGGATTCACAATCC | psbAF | GTTATGCATGAACGTAATGCTC | |
psbK–psbI | psbK | TTAGCCTTTGTTTGGCAAG | psbI | AGAGTTTGAGAGTAAGCAT | Hollingsworth ML, Andra Clark A, Forrest LL, et al. Selecting barcoding loci for plants: evaluation of seven candidate loci with species-level sampling in three divergent groups of land plants. Mol Ecol Resour. 2009;9(2):439-457. |
rpoB | rpoB1 | AAGTGCATTGTTGGAACTGG | rpoB3 | CCGTATGTGAAAAGAAGTATA | |
rpoC1 | rpoC1-1 | GTGGATACACTTCTTGATAATGG | rpoC1-3 | TGAGAAAACATAAGTAAACGGGC | |
accD | accD-1F | AGTATGGGATCCGTAGTAGG | accD-4R | TCTTTTACCCGCAAATGCAAT | Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One. 2007;2(6):e508. |
ndhJ | ndhJ-2F | TTGGGCTTCGATTACCAAGG | ndhJ-3R | ATAATCCTTACGTAAGGGCC | |
atpB | ESATPB172F | AATGTTACTTGTGAAGTWCAACAAT | ESATPE45R | ATTCCAAACWATTCGATTWGGAG | Schuettpelz, Eric, and Kathleen M. Pryer. Fern phylogeny inferred from 400 leptosporangiate species and three plastid genes. Taxon. 2007;56(4):1037-1050. |
ycf5 | ycf5-1 | GGATTATTAGTCACTCGTTGG | ycf5-4 | CCCAATACCATCATACTTAC | Zhang X, Zhou T, Yang J, et al. Comparative Analyses of Chloroplast Genomes of Cucurbitaceae Species: Lights into Selective Pressures and Phylogenetic Relationships. Molecules. 2018;23(9):2165. |
A DSS is a nucleotide sequence with a constant length that is capable to detect the presence of an organism (named as target species) and to distinguish it from other species (named as background species). We applied BLAST with in-house Python scripts to identify DSSs. Briefly, for a target species, the first step was to generate k-mers (e.g., 20-mer) from one of its chloroplast assemblies using the sliding window method. A fixed-length (e.g., 20 bp) sliding window slides from the first base of the selected chloroplast assembly with 1 bp step to generate all possible k-mers. All obtained k-mers were de-duplicated for subsequent DSS identification. Second, the non-redundant k-mers were blasted against other assemblies to identify k-mers that were conserved in the target species and then blasted against the chloroplast assemblies of background species. Last, k-mers present in background species assemblies were removed, and the rest were considered as DSSs.
The DSSs deposited in CGIR are calculated using a 40 bp k-mer length and other species in the same family are considered background species when calculating DSSs for a target species. For example, to calculate Oryza ridleyi DSSs, we use other species from Poaceae as background species.
A. How can I get the chloroplast genome information?
If you are looking for the chloroplast genome information for any taxon of interest, please use Search/Advanced Search tool in Genome page. The search results include species name, synonym species name, genome size, GC content, accession number, etc. You could click assembly accession in results to get detailed information for the specific assembly, such as gene characteristics.
B. How can I get DSSs information?
If you are interested in developing a method for species identification, please visit the page of DSSs and search the taxa you are interested in. All DSSs can be downloaded for further exploration. Since DSSs were calculated at species level, there are no DSS for taxonomic ranks below species, such as subspecies or variations.
C. How can I identify plants using my barcode sequence in CGIR?
If you have a batch of barcode sequences, you could visit the page of Tools and submit it in BarcodeBLAST tool. The results of your sequences aligned against DNA barcodes deposited in the CGIR will be sent to your e-mail account.
D. Can I identify barcodes for my own chloroplasts in CGIR?
You could visit the page of Tools and submit your chloroplasts in BarcodeFinder tool. The results of barcodes identified will be sent to your e-mail account.
A. Funding Support
- Special Funds for Basic Resources Investigation Research of the Ministry of Science and Technology (2018FY10080002)B. Comments & Collaborations
We look forward to worldwide comments, suggestions and guidance from colleagues and peers with common research interests.
We would love to hear from you for any questions or comments. Please find our contact information here.
Chinese Academy of Chinese Medical Sciences (CACMS)
16 Dongzhimen South Road, Dongcheng District
Beijing 100700, China
Telephone: +86 (10) 8402-7175
Fax: +86 (10) 8402-7175
E-mail: y_yuan0732@163.com
Beijing Institute of Genomics, Chinese Academy of Sciences
1 Beichen West Road, Chaoyang District
Beijing 100101, China
Telephone: +86 (10) 8409-7340
Fax: +86 (10) 8409-7200
E-mail: songshh@big.ac.cn