Summary: To understand gene expression networks leading to functional properties and compositional traits of the soybean seed, we have undertaken a detailed examination of soybean seed development from a few days post-fertilization to the mature seed using Illumina high-throughput transcriptome sequencing (RNA-Seq). RNA was sequenced from seven different stages of seed development, yielding between 12 million and 78 million sequenced transcripts. These have been aligned to the 79,000 gene models predicted from the soybean genome recently sequenced by the Department of Energy Joint Genome Institute. Over one hundred gene models were identified with high expression exclusively in young seed stages, starting at just four days after fertilization. These were annotated as being related to many basic components and processes such as histones and proline-rich proteins. Genes involved with some storage proteins such as glycinin and beta-conglycinin had their highest expression levels at the stages of largest fresh weight, confirming previous knowledge that these storage products are being rapidly accumulated before the seed begins the desiccation process. Other gene models showed high expression in the dry, mature seeds, perhaps indicating the preparation of pathways needed later, in the early stages of imbibition. Many highly-expressed gene models at the dry seed stage are, as expected, annotated as hydrophilic proteins associated with low water conditions, such as late embryogenesis abundant (LEA) proteins and dehydrins, which help preserve the cellular structures and nutrients within the seed during desiccation. Hundreds of transcription factors with notable expression in at least one stage of seed development were also identified and examined. Results from a second biological replicate demonstrate high reproducibility of these data.
Overall Design: High-throughput sequencing using Illumina Genome Analyzer II and Illumina HiSeq 2000 (RNA-Seq) was performed on seven stages of soybean seeds, with two biological replicates per stage.
Strategy: |
|
Species: |
|
Tissue: |
|
Height/Length/Weight: |
|
Isolation_source: |
|
Development Stage: |
|
Growth Protocol: | Plants were grown in optimal greenhouse conditions. Tissues were harvested at stated days after flowering or mg fresh weight. Whole seeds were divided by hand into cotyledons and seed coats for some samples. |
Treatment Protocol: | - |
Extract Protocol: | Total RNA was extracted from tissues using phenol:chloroform and a lithium chloride precipitation (Gonzalez and Vodkin 2007). Sequencing was performed by the Keck Center (University of Illinois) using an Illumina Genome Analyzer II or an Illumina HiSeq 2000. |
Library Construction Protocol: | RNA libraries were prepared for sequencing using standard Illumina protocols |
Molecule Type: | poly(A)+ RNA |
Library Source: | |
Library Layout: | PAIRED; SINGLE |
Library Strand: | - |
Platform: | ILLUMINA |
Instrument Model: | Illumina Genome Analyzer II; Illumina HiSeq 2000 |
Strand-Specific: | Unspecific |
Data Resource | GEN Sample ID | GEN Dataset ID | Project ID | BioProject ID | Sample ID | Sample Name | BioSample ID | Sample Accession | Experiment Accession | Release Date | Submission Date | Update Date | Species | Race | Ethnicity | Age | Age Unit | Gender | Source Name | Tissue | Cell Type | Cell Subtype | Cell Line | Disease | Disease State | Development Stage | Mutation | Phenotype | Case Detail | Control Detail | Growth Protocol | Treatment Protocol | Extract Protocol | Library Construction Protocol | Molecule Type | Library Layout | Strand-Specific | Library Strand | Spike-In | Strategy | Platform | Instrument Model | Cell Number | Reads Number | Gbases | AvgSpotLen1 | AvgSpotLen2 | Uniq Mapping Rate | Multiple Mapping Rate | Coverage Rate |
---|