IC4R002-Genome-2002-11935018

From RiceWiki
Jump to: navigation, search

Project Title

  • A Draft Sequence of the Rice Genome (Oryza sativa L. ssp.japonica)


The Background of This Project

  • The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50,000 genes. Homologs of 98% of the known maize, wheat, and barley proteins are found in rice. Synteny and gene homology between rice and the other cereal genomes are extensive, whereas synteny with Arabidopsis is limited. Assignment of candidate rice orthologs to Arabidopsis genes is possible in many cases. The rice genome sequence provides a foundation for the improvement of cereals, our most important crops.


Plant Culture & Treatment

  • Contigs of nonrice origin were identified by sequence homology to known bacteria, high GC content, lack of homology to rice BAC end sequences, and/or depth of coverage. Sequence analysis identified 6 Mbp as originating from two related bacterial species(Xanthomonadales), likely representing endophytes present in the plant material used for DNA isolation.
  • Syd(Syngenta draft sequence;data access information is available at www.tmri.org) data were compared to almost 1 million bases of IRGSP’s completed rice genome sequence to determine coverage and quality.
  • Timelogic’s Decypher FrameSearch algorithm was used to detect and guide the correction of frameshifts caused by indels. For each predicted gene,the fraction of the length with homology to known genes, predicted genes from other species, Prosite motifs (25), or Pfam domains (26) was used as a confidence score.


Research Findings

  • Translated HMLgenes300 were classified with the software package INTERPRO (27,28). INTERPRO output was filtered to create sets of the longest protein domain for each associated protein, and domains were categorized using Gene Ontology (GO) software(29). The results of these classifications are shown in Fig. 1; about 44% of Hgenes, 32% of Mgenes, and 5% of Lgenes were classified, respectively. Most of the classified proteins fall into the categories of metabolism and cell communication/signal transduction.


Fig. 1. Rice gene prediction classifications. HMLgenes300 were classified with Interpro and GO software (27–29); the categories generated are shown.


  • Eighty-five percent of Arabidopsis predicted proteins (21,590 of 25,554) were significantly homologous to HMLgenes300 predicted proteins; of these, 2565 show very strong conservation between Arabidopsis and rice (Fig. 2).


Fig. 2. Similarity of 25,554 Arabidopsis proteins and best rice homologs. Predicted Arabidopsis proteins (October 2001, ftp.tigr.org) were compared (BLASTP E value # –6) with HMLgenes300 translations.The expectation values range from E , 2180(high homology) to E . 26 (low homology) and are depicted in intervals spanning 10 exponents(e.g., , 2180, 2180 to 2171, 2170 to 2161, etc.)


  • Flowering in Arabidopsis is initiated by flowering-time genes that activate floral meristem identity genes, leading to the patterned expression of floral organ identity genes (Fig. 3). Rice contains single-copy homologs of the Arabidopsis flowering-time genes GI, CO, LD, and FCA.


Fig. 3. Flowering-time and flower development genes in the rice genome. A simplified model shows the predicted genetic network regulating flowering time and flower development in Arabidopsis, with gene names color-coded to indicate clearidentification of an ortholog in the rice genome (red) or no clear identification(white). In Arabidopsis there are three genetic pathways that control flowering time (100, 101). The long-day pathway represented by GI and CO and the autonomous pathway represented by LD, FCA, and FLC are likely integrated through FT and AGL20 to promote activation of meristem identity genes LFY, AP1, and CAL.The vernalization pathway, represented by FRI, feeds into the autonomous pathway upstream of FLC. The GA pathway,represented by GA1, leads to the activation of LFY. TFL serves to restrict the expression of the meristem identity genes to floral meristems, where they promote the patterned expression of floral organ identity genes AP2, AP3, PI, and AG, which are also affected by the regulatory genes ANT, UFO, and SUP (102, 103).


  • The level of synteny among cereals was determined by comparing anchored rice genomic sequence to mapped sequence from other cereals. Related regions of the rice and maize genomes were aligned (Fig. 4).


Fig. 4. Rice-maize synteny. Maize markers were mapped to the rice genome in silico. Maize map and sequence information were derived from MaizeDB (610 markers) and GenBank, respectively. Maize chromosomes are indicated along the vertical black lines; positions of specific markers and bins are defined by horizontal lines. Rice chromosomes are represented by numbered, colored rectangles. Significant homology (at least 80% identity, over 100 continuous base pairs, between a maize chromosomal region and a particular rice region) is indicated by a colored rectangle to the right of the maize chromosome. For a more detailed version of this map, see Web site link 24.


  • About 2000 cereal quantitative trait loci (QTLs)have been mapped (88–98) and can be placed on the rice genome map en masse. For example, many maize QTLs were associated with the top of rice chromosome 1 by aligning maize chromosomes 1, 2, and 7 with this region (Fig. 5A).


Figure.5A. Maize QTLs mapped to the rice genome.(A) Rice-maize comparative QTL mapping. Portions of maize chromosomes, represented by numbered, colored rectangles, that show sequence similarity (at least 80% identity over 100 continuous base pairs) with specific regions of the top of rice chromosome 1 are shown. The rice map is from the IRGSP. Genetic distance is indicated by the numbers to the left of the rice chromosome (e.g., 1004.2 means 4.2 cM from the tip of chromosome 1); specific markers that map to this region are indicated to the right. Regions from maize chromosomes 1, 2, and 7 show similarity with the tip of rice chromosome 1 as shown, and maize QTLs in these regions are indicated. The region represented by the thick black line comprises ;650 kbp in rice; each colored block represents varying amounts of maize DNA.


  • As a more specific example,a QTL influencing grain yield (QTL 21) that maps to maize chromosome 1 (99) was localized to the syntenic region of rice chromosome 3, containing ;220 HMLgenes300 and more than 120 rice SSRs (Fig. 5B). With the use of these genes, ;100 unmapped maize cDNAs were identified by homology and are therefore candidate genes influencing yield.


Figure.5(B) Detailed example of rice-maize comparative QTL mapping. Grain yield QTL 21 is mapped to maize map bin 1.03 between cDNA markers csu 710 and csu 392, and is syntenic with rice chromosome 3. Additional markers from the same maize bin confirm microsynteny in this target region, which contains;220 candidate genes and 120 SSR markers in rice. Dotted lines connect homologous genes with the indicated BLAST expectation values.

Labs working on this Project

  • Torrey Mesa Research Institute, Syngenta, 3115 Merryfield Row, San Diego, CA 92121, USA (www.tmri.org).
  • Bryan College, Dayton, TN 37321, USA.
  • Department of Biological Sciences, Northern Illinois University, DeKalb, IL 60115, USA.
  • Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA.
  • Clemson University Genomics Institute, 100 Jordan Hall, Clemson, SC 29630, USA.
  • Myriad Genetics, 320 Wakara Way, Salt Lake City,UT 84108, USA.


Corresponding Author

  • stephen.goff@syngenta.com