IC4R004-Epigenomic-2012-22778444

From RiceWiki
Jump to: navigation, search

Project Title

  • Transcriptome and methylome interactions in rice hybrids

The Background of This Project

  • DNA methylation is an epigenetic mark that can often lead to the repression of gene expression (1, 2). It is enriched in heterochromatin and, when present at regulatory sites, usually acts as a repressor of expression, most notably in transposons (3). However, it is also found over coding regions, where it likely does not directly affect transcription and is associated with moderately expressed genes (2, 4–6). In plants, DNA methylation occurs in three different contexts: CG, CHG, and CHH (where H is any nucleotide but G). In Arabidopsis, each context is maintained by different enzymes: MET1 for CG sites, CMT3 for CHG sites and DRM2 for CHH sites. CG and CHG sites are symmetric across the two DNA strands, which is thought to be important for the maintenance of methylation at these sites following DNA repli- cation. In contrast, CHH sites are not symmetric, and their methylation is mediated by RNA-directed DNA methylation pathways (RdDM), which use siRNAs to initiate de novo methylation (3). Cellular methylation states tend to persist during cell division, and recent studies in Arabidopsis have also shown that DNA methylation is faithfully inherited across generations (7, 8). Nonetheless, we are only beginning to understand how different methylation patterns from inbred parents may “interact” during the generation of their hybrid progeny (9, 10).
  • In this project, the researchers generated integrative maps of whole-genome cytosine methylation profiles [bisulfite sequencing (BS-seq)] and transcriptional profiles (RNA-seq), to characterize two rice subspecies, Oryza sativa spp japonica [Nipponbare (NPB)] and Oryza sativa spp indica (93–11) and their two reciprocal hybrid off- spring. Using a combination of BS-seq, RNA-seq, and siRNA-seq, we were able to generate allele-specific patterns of methylation and transcription in the hybrids, and thus directly measure the degree to which these are altered between the corresponding parental and F1 chromosomes.

Plant Materials & Treatment

  • Oryza sativa ssp. japonica (NPB), O.sativa ssp. indica (93–11), and their reciprocal cross F1s (NPB × 93–11 and 93–11 × NPB) were used in this study. Rice seeds were surface sterilized with 40% (vol/vol) sodium hypochlorite solution and transferred on 1∕2 Murashige and Skoog medium. After germination, rice seedlings were trans- planted into soil and grown at 26 °C/20 °C under a 10-h light/14-h dark cycle in growth chamber. Fully expanded leaves from 6-wk-old plants were collected for library construction.
  • BS-seq libraries were made from genomic DNA isolated from leaf tissues of NPB, 93–11, and their two reciprocal hybrid offspring, 93–11 × NPB and NPB × 93–11, by a previously published method using pre-methylated Illumina adapters

Research Findings

  • Using the Michigan State University rice genome version 6.1 annotation (11) as a reference, the researchers mapped the bisulfite-converted reads to the genome using BS Seeker (12). From these alignments we determined SNPs between the NPB and 93–11 parental plants. Although there is ambiguity because of cytosine conversion in bisulfite-treated DNA, we can distinguish conversions from mutations by analyzing the sequence of the complimentary base of each cytosine, and, as a result, were able to call all possible types of SNPs. For our SNP calls we required that we had at least three reads on each strand and over 90% agreement between them before making a call (Fig. 1A). Using these SNP calls we reconstructed two genomes, one for each parent. We then mapped F1 reads against these inferred genomes to identify the parental chromosome from which they were derived.


Figure 1. SNPs between NPB and 93–11. (A) Identification of SNPs required a minimum of three reads per strand in each parental strain. Although there is ambiguity because of cytosine conversion in bisulfite-treated DNA, we can resolve this by considering the sequence of the reverse strand, and, as a result, are able to call all possible types of SNPs. In the example, X indicates the location of the SNP and reads represent, in order, NPB forward strand, NPB reverse strand, 93–11 forward strand, and 93–11 reverse strand. (B) SNP distribution over the genome is plotted on the bottom in windows of 10,000 bases. Each color change represents a new chromosome. The top line represents the density of genes.


  • We found one SNP every 253 base pairs between the two parental strains (Fig. 1B), representing a level of divergence that is similar to previous japonica–indica comparisons (13). Not surprisingly, regions enriched with genes had lower densities of SNPs. Of these SNPs, the most frequent were C to T changes, or their complements corresponding to G to A mutations on the opposite strand. Combined, these types of mutations accounted for 73% of the SNPs that were found between varieties (Fig. 1C), which represents a threefold enrichment compared to the expected value if all types of mutations were equally likely. We also observe that methylated cytosines mutate more than three times more frequently than nonmethylated cytosines(Fig. 1D), and they most often mutate to thymines. This is consistent with previous observations that the deamination of methylcytosine, which results in thymine, occurs at higher rates than other spontaneous mutations (14–16).


Figure 1. (C) The fraction of the various types of SNPs found in NPB versus 93–11 is shown with the CT and GA SNPs grouped together, and they represent nearly 74% of all SNPs. (D) The relative frequency of each SNP between NPB and 93–11 is shown per 1,000 bases. Methyl-Cs convert to Ts at a rate that is nearly five times that of unmethylated Cs.


  • Next, the researchers compared mutation rates between the parental strains and the hybrids. Not surprisingly, we found that SNPs were nearly absent across generations. If, as is likely, the few SNPs we call are false positives, we estimate the false discovery rate of our SNP calls to be less than 0.0003%.
  • The researchers also found a small number of SNPs between the reference Michigan State University O. sativa japonica genome and our as- sembled version (1∕90651 bp; 2,487 total) (Dataset S2), and considerably more SNPs by comparing our assembled version of the 93–11 genome with the reference Beijing Genome Institute’s O. sativa ssp. indica genome (1∕51 bp), consistent with the fact that the 93–11 assembly is not as complete as that of the NPB cultivar (17). The fact that we only align reads that map to a unique posi- tion in the genome, have at most three mismatches with the reference genome, do not permit insertions or deletions, and re- quire a minimum of three reads on both strands, allowed us to only call bases over about half of the genome in both NPB and 93–11, and thus we are reporting only a subset of all true SNPs.

Labs working on this Project

  • Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095; b Howard Hughes Medical Institute,
  • University of California, Los Angeles, CA 90095; f Molecular Biology Institute, University of California, Los Angeles, CA 90095; c Department of Plant
  • Pathology, Ohio State University, Columbus, OH 43210; d Department of Plant and Soil Sciences, Delaware Biotechnology Institute, University of Delaware,
  • Newark, DE 19711; e US Department of Agriculture—Agricultural Research Service Dale Bumpers National Rice Research Center, Stuttgart, AR 72160;
  • Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, CA 90095

Corresponding Author

  • Steven E. Jacobsen (E-mail:jacobsen@ucla.edu) & Matteo Pellegrini (E-mail: matteop@mcdb.ucla.edu)