IC4R001-Genome-2002-11935017

From RiceWiki
Jump to: navigation, search

Project Title

  • A Draft Sequence of the Rice Genome ( Oryza sativa L. ssp. indica )


The Background of This Project

  • Rice is the most important crop for human consumption, providing staple food for more than half the world’s population. The euchromatic portion of the rice genome is estimated to be 430 Mb in size (1–3), which is the smallest of the cereal crops. It is 3.7 times larger than that of A. thaliana (4–6), and 6.7 times smaller than that of the human (7, 8). The well-established protocols for high-efficiency genetic transformation, widespread availability of high-density genetic and physical maps (9, 10), and high degrees of synteny among cereal gegenomes (11–15) combine to make rice a unique organism for studying the physiology, developmental biology, genetics, and evolution of plants.
  • The International Rice Genome Sequencing Project (IRGSP) (16) has already delivered a substantial amount of sequence for the japonica (Nipponbare) subspecies, in bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC)–sized contigs. Working independently, Monsanto and Syngenta (17, 18) established proprietary working drafts for japonica, in April 2000 and February 2001, respectively. The Monsanto sequence has been used to assist in the efforts of the IRGSP.
  • In this study We are releasing a draft genome sequence for rice from 93-11 (19), which is a cultivar of Oryza sativa L. ssp. indica, the major rice subspecies grown in China and many other Asia-Pacific regions.


Plant Culture & Treatment

  • The researchers used a “whole-genome shotgun” approach, as successfully applied to Drosophila melanogaster (25) and Homo sapiens (8).The data are complementary to those of the IRGSP, which is sequencing Nipponbare, a cultivar of the subspecies japonica, with a “clone-by-clone” approach. If assuming a euchromatic rice genome size of 430 Mb, and a Phred Q20 (26, 27) read length of 500 base pairs (bp), then 13 coverage would be equivalent to 0.86 million sequence reads, or 1 million reads after the typical success rate of 80 to 85% is factored in. Shotgun libraries were constructed with a variety of methods for clone-insert preparation (28–30), to minimize the likelihood of systematic biases in genome representation. A total of 55 plasmid libraries were constructed for 93-11 and PA64s, with a 2-kb nominal clone-insert size.Overall, we prepared 2.75 million plasmid DNA samples (31, 32). Sequencing was performed on both ends of the inserts. By the 21 October 2001 freeze, there were 4.62 million successful reads, indicating an 84% success rate. The average Q20 read length was 546 bp.


Research Findings

  • The researchers have found that smaller windows are more informative, because when these windows are larger than a typical gene size, they obscure differences between intergenic DNA and genes. We used a 500-bp window size, to obtain a smaller size than that of most plant genes (Fig. 2)


'Fig. 2. Distributions for genomic GC content in A. thaliana, O. sativa, and H. sapiens, computed over a bin size of 500 bp. Note that for bins/10=100, the number of bins with that GC content is 1000.'


  • The researchers plotted GC content distributions for exons and introns (Fig. 3). Rice exons exhibited a GC-rich tail, but rice introns did not, indicating that the GC-rich tail in the rice genomic distribution was primarily due to the exons.


'Fig. 3. GC content distribution for exons and introns in A. thaliana, O. sativa, and H. sapiens. All exon and intron sequences were derived from cDNA-to-genomic alignments. '


  • After the GC contents of individual exons and introns were plotted as a function of genomic length (i.e., the sum of exon and intron lengths), it was apparent that most of the variation was within genes (Fig.4).


'Fig. 4. GC content for individual exons as a function of their gene size, in A. thaliana, O.sativa, and H. sapiens. All exon and intron sequences were derived from cDNA-to-genomic alignments. Each data point is a single exon.Exons for the same gene are plotted at the same abscissa and connected by a vertical line.The genes are sorted by size, where gene size is defined as the sum of exon and intron lengths.To make the figure legible, we use constant spacing between genes, thus resulting in nonuniform abscissa labels. We show only the 41 largest genes for which the entire cDNA could be aligned to genomic sequence. Given the draft nature of the rice genome, some of the largest rice genes had to be omitted.'


  • Exon sizes are narrowly constrained, but intron sizes can be highly variable within and between organisms. Intron-size distributions tend to be bimodal, weakly (most organisms) or strongly(human). There is always a sharp “spike” at some organism-specific minimum size,which is about 90 bp for plants and vertebrates (Fig. 6).
'Fig. 6. Exon- and intron-size distributions for A.thaliana, O. sativa, and H. sapiens, with color indicating averaged GC content for exons or introns at that size range. All exon and intron sequences were derived from cDNA-to-genomic alignments.'

Labs working on this Project

  • Beijing Genomics Institute/Center of Genomics and Bioinformatics, Chinese Academy of Sciences, Beijing 101300, China.
  • Hangzhou Genomics Institute–Institute of Bioinformatics of Zhejiang University–Key Laboratory of Bioinformatics of Zhejiang Province,Hangzhou 310007, China.
  • Institute of Genetics, Chinese Academy of Sciences, Beijing 100101, China.
  • University of Washington Genome Center, Department of Medicine, Seattle, WA 98195, USA.
  • College of Life Sciences, Peking University, Beijing 100871,China.
  • Medical College, Xi’an Jiaotong University,Xi’an 710061, China.
  • Fudan University, Shanghai 200433, China.
  • National Hybrid Rice R&D Center,Changsha 410125, China.
  • Laboratory of Bioinformatics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
  • Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, China.
  • Digital China Ltd., Beijing 100080,China.
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China.


Corresponding Author

  • Huanming Yang:hyang@genomics.org.cn.