IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data.

Jian Sang, Dong Zou, Zhennan Wang, Fan Wang, Yuansheng Zhang, Lin Xia, Zhaohua Li, Lina Ma, Mengwei Li, Bingxiang Xu, Xiaonan Liu, Shuangyang Wu, Lin Liu, Guangyi Niu, Man Li, Yingfeng Luo, Songnian Hu, Lili Hao, Zhang Zhang
Author Information
  1. Jian Sang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  2. Dong Zou: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  3. Zhennan Wang: University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.
  4. Fan Wang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  5. Yuansheng Zhang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  6. Lin Xia: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  7. Zhaohua Li: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  8. Lina Ma: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  9. Mengwei Li: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  10. Bingxiang Xu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  11. Xiaonan Liu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  12. Shuangyang Wu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  13. Lin Liu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  14. Guangyi Niu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  15. Man Li: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  16. Yingfeng Luo: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
  17. Songnian Hu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: husn@im.ac.cn.
  18. Lili Hao: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China. Electronic address: haolili@big.ac.cn.
  19. Zhang Zhang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: zhangzhang@big.ac.cn.

Abstract

Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.

Keywords

References

  1. Nat Protoc. 2016 Sep;11(9):1650-67 [PMID: 27560171]
  2. Bioinformatics. 2019 Sep 1;35(17):2949-2956 [PMID: 30649200]
  3. Plant Physiol. 2014 Feb;164(2):513-24 [PMID: 24306534]
  4. Bioinformatics. 2014 Aug 1;30(15):2114-20 [PMID: 24695404]
  5. Nucleic Acids Res. 2007 Jan;35(Database issue):D193-7 [PMID: 17142230]
  6. Bioinformatics. 2015 Oct 1;31(19):3210-2 [PMID: 26059717]
  7. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66 [PMID: 14500829]
  8. Genomics Proteomics Bioinformatics. 2017 Feb;15(1):11-13 [PMID: 28235641]
  9. Nucleic Acids Res. 2016 Jan 4;44(D1):D1172-80 [PMID: 26519466]
  10. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D741-4 [PMID: 16381971]
  11. Genome Res. 2002 Apr;12(4):656-64 [PMID: 11932250]
  12. Genomics Proteomics Bioinformatics. 2015 Jun;13(3):137-47 [PMID: 25936895]
  13. Curr Opin Plant Biol. 1999 Apr;2(2):86-9 [PMID: 10322198]
  14. BMC Genomics. 2011 Nov 02;12:540 [PMID: 22047402]
  15. Nucleic Acids Res. 2018 Jan 4;46(D1):D121-D126 [PMID: 29036693]
  16. Nat Biotechnol. 2010 May;28(5):511-5 [PMID: 20436464]
  17. Nucleic Acids Res. 2014 Jan;42(Database issue):D1222-8 [PMID: 24136999]
  18. Nucleic Acids Res. 2007 Jan;35(Database issue):D883-7 [PMID: 17145706]
  19. Bioinformatics. 2014 May 1;30(9):1236-40 [PMID: 24451626]
  20. Science. 2002 Apr 5;296(5565):92-100 [PMID: 11935018]
  21. Biol Direct. 2014 Jul 10;9:17 [PMID: 25011537]
  22. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21 [PMID: 21062823]
  23. DNA Res. 2018 Feb 1;25(1):61-70 [PMID: 29036429]
  24. Genomics Proteomics Bioinformatics. 2017 Feb;15(1):14-18 [PMID: 28387199]
  25. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W345-9 [PMID: 17631615]
  26. Bioinformatics. 2005 Mar 1;21(5):650-9 [PMID: 15388519]
  27. Rice (N Y). 2013 Feb 06;6(1):4 [PMID: 24280374]
  28. Bioinformatics. 2005 Sep 15;21(18):3674-6 [PMID: 16081474]
  29. Bioinformatics. 2005 May 1;21(9):1859-75 [PMID: 15728110]
  30. Plant J. 2017 Feb;89(4):789-804 [PMID: 27862469]
  31. Science. 2002 Apr 5;296(5565):79-92 [PMID: 11935017]
  32. Genome Biol. 2015 Jan 13;16:4 [PMID: 25583365]
  33. Genomics Proteomics Bioinformatics. 2014 Aug;12(4):153-5 [PMID: 25042682]
  34. Nucleic Acids Res. 2008 Jan;36(Database issue):D1028-33 [PMID: 18089549]
  35. J Genet Genomics. 2017 May 20;44(5):235-241 [PMID: 28529082]
  36. Nucleic Acids Res. 2018 Jan 4;46(D1):D14-D20 [PMID: 29036542]
  37. Nucleic Acids Res. 2020 Jan 8;48(D1):D24-D33 [PMID: 31702008]
  38. Nature. 2005 Aug 11;436(7052):793-800 [PMID: 16100779]
  39. Plant Mol Biol. 1997 Sep;35(1-2):101-13 [PMID: 9291964]
  40. PLoS Biol. 2005 Feb;3(2):e38 [PMID: 15685292]

MeSH Term

Amino Acid Sequence
Gene Expression Regulation, Plant
Genes, Plant
Genome, Plant
Molecular Sequence Annotation
Organ Specificity
Oryza
Phylogeny
Plant Proteins
RNA, Long Noncoding
RNA-Seq
Statistics as Topic

Chemicals

Plant Proteins
RNA, Long Noncoding

Links to CNCB-NGDC Resources

Database Commons: DBC001638 (IC4R)

Word Cloud

Similar Articles

Cited By