IC4R002-lncRNA-2014-25517485
Contents
Project Title
Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice
The Background of This Project
- Long noncoding RNAs (lncRNAs) play important roles in a wide range of biological processes in mammals and plants. However, the systematic examination of lncRNAs in plants lags behind that in mammals. Recently, lncRNAs have been identified in Arabidopsis and wheat; however, no systematic screening of potential lncRNAs has been reported for the rice genome.
- Non-protein-coding RNAs (ncRNAs) constitute a substantial portion of transcribed sequences with structural, regulatory or unknown functions. Because of these important biological roles, ncRNAs have been of great research interest in recent years. Attention was previously given to small regulatory RNAs (sRNAs), such as microRNAs (miRNAs), which are less than 200 nucleotides in length. ncRNAs longer than 200 nucleotides (long non-coding RNAs (lncRNAs)) were found to have functions associated with virtually every biological process in mammals, and these initial reports initiated a wave of research on lncRNAs that followed the path of sRNA research. Recently, lncRNAs have emerged as potent regulators, particularly in mammals. However, studies on lncRNAs in plants remain at the early stage; only a few lncRNAs have been shown to regulate plant development, especially during reproduction.
- Sexual reproduction is one of the most essential biological processes and occurs in a vast number of species. Numerous studies have been devoted to the identification of reproduction-related genes, making great progress in understanding the reproductive processes of both animals and plants. However, the complex regulatory networks involving these genes remain largely unknown. Intriguingly, many lncRNAs have recently been proven to play important roles in reproductive processes through the regulation of related genes in various species. In mammals, lncRNAs, such as Xist, H19, Kcnq1ot1, bxd and HOTAIR, have been found to be crucial for the precise control of embryogenesis. Notably, several plant lncRNAs have also been demonstrated to participate in reproductive regulation, including COLDAIR, COOLAIR, LDMAR, CsM10 and Zm401, indicating that one of the principal functions of plant lncRNAs might be to regulate plant reproduction. More interestingly, Komiya et al. found that a number of large intergenic non-coding RNAs (lincRNAs) could generate 21-nucleotide phasiRNAs, which associate with the germline-specific Argonaute (AGO) proteinMEL1 in rice, indicating that rice lncRNAs might play a role in the development of pre-meiotic germ cells. Genome-wide analysis is necessary to discover new lncRNAs and is important for the further functional analysis of these RNAs. More than 8,000 lncRNAs have been identified in humans using bioinformatic methods, and approximately 4,000 lncRNAs have been identified in mice. In plants, 6,480 transcripts have been classified as lncRNAs in Arabidopsis, and 125 putative stress-responsive lncRNAs have been identified in wheat. Although rice is a model species for plant development studies and represents a staple food for nearly half of the global population, rice lncRNAs remain poorly characterized, and no systematic screening of potential lncRNAs in the rice genome has been reported.
Plant Culture & Treatment
- Total RNA was obtained from rice anthers before flowering, pistils before flowering, spikelets 5 DAP and shoots 14 DAG; these samples were used for sequencing. The preparation of whole transcriptome libraries and deep sequencing were performed by the Annoroad Gene Technology Corporation (Beijing, PR China). Whole transcriptome libraries were constructed using TruSeq Stranded Total RNA with Ribo-Zero Gold (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions. Libraries were controlled for quality and quantified using the BioAnalyzer 2100 system and qPCR (Kapa Biosystems, Woburn, MA, USA). The resulting libraries were sequenced initially on a HiSeq 2000 instrument that generated paired-end reads of 100 nucleotides. The sequencing data have been submitted to the NCBI Sequence Read Archive (SRA accession number SRP047482).
Research Findings
- To systematically identify lncRNAs related to rice reproduction, we performed whole transcriptome ssRNA-seq of rice anthers, pistils, seeds that were harvested 5 DAP and shoots that were harvested 14 DAG (the sequencing results included 3.89×10 8 reads; Additional file 1; Sequence Read Archive (SRA) accession number SRP047482). We then developed a rice lncRNA computational identification pipeline based on RNA-seq data (Figure 1) using 4 whole transcriptome ssRNA-seq data sets and 40 available poly(A) RNA-seq data sets (1.23×10 9 reads). These datasets covered most of the organs and stages involved in rice reproduction (Additional file 1) and were suitable for the identification of reproduction-related lncRNAs. Our lncRNA identification strategy comprised three key procedures (Figure 1).First, the rice transcriptome was reconstructed from all of the RNA-seq datasets using Cufflink2.0. After filtering out infrequently expressed transcripts (those showing FPKM (fragments per kilobase of transcript per million mapped reads) scores <0.5 in all samples) and transcripts without strand information, we recovered 77.4% (30,219/39,045) of the non-transposable element (non-TE)-related mRNAs in the datasets (the mRNAs discussed in the following sections are non-TE related mRNAs unless otherwise specified). The efficient recovery of known protein-coding genes indicated that the dataset employed here was suitable for the recovery of novel transcribed regions of the rice genome.
- We characterized the basic genomic features of the obtained lncRNAs and compared these features with the available features of Arabidopsis or human lncRNAs or to rice protein-coding genes where appropriate. We found that only a small fraction (median percentage, 6.5%) of the sequence of most of the lncNATs was antisense overlapped by protein-coding mRNA (Figure 2A) and that lincRNAs and lncNATs are similar in many aspects (Figure 2). To display the characteristics of lincRNAs and lncNATs more clearly, we analyzed the characteristics of lincRNAs and lncNATs separately in the following comparisons. Similar to findings for Arabidopsis, only around half of lncRNAs were spliced (46.5% for lincRNAs, 65.9% for lncNATs). In contrast, more than 98% of human lncRNAs are spliced (Figure 2B). Rice lncRNAs have fewer exons than mRNAs (2.21 versus 4.67 on average, respectively; 2.10 exons for lincRNAs and 2.42 exons for lncNATs), but their exon lengths (median length of 323 nucleotides; 322 nucleotides for lincRNAs, 298 nucleotides for lncNATs) are longer than those of mRNA (median length of 159 nucleotides) (Figure 2C). Full-length rice lncRNA transcripts (median length of852 nucleotides; 800 nucleotides for lincRNAs, 950 nucleotides for lncNATs) are longer than Arabidoposis lncRNA transcripts (median length of 285 nucleotides) and human lncRNA transcripts (median length of 592 nucleotides), and are generally shorter than protein-coding transcripts (median length of 1,411 nucleotides) (Figure 2D). Rice lncRNAs generally do not overlap with repeat sequences (Figure 2E); fewer repeats-overlapped rice lncRNAs than repeats overlapped rice mRNAs and repeats-overlapped human lncRNAs. Like Arabidopsis lncRNAs, only a small proportion of rice lncRNAs (122 of 1,624 lincRNAs, 7.5%; 44 of 600 lncNATs, 7.3%) generate sRNAs (Additional file 3), implying that these lncRNAs might function through generating sRNAs. Interestingly, rice lncRNAs were much more A/U-rich than the coding sequences and the 5′UTRs of protein-coding genes but were less A/U-rich than 3′to appear in divergent orientations with respect to the closest neighboring protein-coding genes. However, we did not observe a stronger correlation between the expression of rice lincRNAs and their nearest neighbors than that between adjacent protein-coding genes (Figure 2G), although the expression of lncNATs is more highly correlated with convergent and divergent overlapped mRNA than with tandem overlapped mRNAs. UTRs that use A/U-rich elements to regulate mRNA degradation (Figure 2F). This characteristic is conserved in Arabidopsis and animal lncRNAs, implying that this feature might be related to the functions of lncRNAs. Rice lincRNAs are most likely to appear in divergent orientations with respect to the closest neighboring protein-coding genes. However, we did not observe a stronger correlation between the expression of rice lincRNAs and their nearest neighbors than that between adjacent protein-coding genes (Figure 2G), although the expression of lncNATs is more highly correlated with convergent and divergent overlapped mRNA than with tandem overlapped mRNAs.
Labs working on this Project
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, School of Life Science, Sun Yat-Sen University, Guangzhou 510275, PR China
Corresponding Author
- Yue-Qin Chen (lsscyq@mail.sysu.edu.cn)