Summary: Genetic/genome diversity underlying variation in seed oil composition and content among soybean varieties is largely attributed to differences in transcript sequences and/or transcript accumulation of oil production related genes in seeds. Discovery and analysis of sequence and expression variations in these genes will accelerate soybean oil quality improvement. In an effort to identify these variations, we sequenced the transcriptomes of soybean seeds from nine lines varying in oil composition and/or total oil content. Our results showed that 69,338 distinct transcripts from 32,885 annotated genes were expressed in seeds. A total of 8,037 transcript expression polymorphisms and 50,485 transcript sequence polymorphisms (48,792 SNPs and 1,693 small Indels) were identified among the lines. Effects of the transcript polymorphisms on their encoded protein sequences and functions were predicted. The studies also provided independent evidence that the lack of FAD2-1A gene activity and a non-synonymous SNP in the coding sequence of FAB2C caused elevated oleic acid and stearic acid levels in soybean lines M23 and FAM94-41, respectively. As a proof-of-concept, we developed an integrated RNA-seq and bioinformatics approach to identify and functionally annotate transcript polymorphisms, and demonstrated its high effectiveness for discovery of genetic and transcript variations that result in altered oil quality traits. The collection of transcript polymorphisms coupled with their predicted functional effects will be a valuable asset for further discovery of genes, gene variants, and functional markers to improve soybean oil quality.
Overall Design: Transcriptome comparison of nine different soybean varieties
Strategy: |
|
Species: |
|
Tissue: |
|
Development Stage: |
|
Growth Protocol: | All soybean (Glycine max (L.) Merrill) lines were grown in growth chambers with temperature set at 25°C day/ 23°C night, the humidity at 50% and 16 hour days at up to 1,000 µmol of supplemental lighting. The plants were watered, fertilized and managed for pests and disease as needed. Seeds at the S6 stage of seed maturation were carefully selected based on seed weight and color, and were harvested for RNA preparation. |
Treatment Protocol: | The oil composition of soybean seeds was determined with the Agilent 7890A GC with S/Sl injection and FID detection at the USDA-ARS Plant Genetic Resources Conservation Unit (PGRCU) in Griffin, Georgia. Three single seeds per soybean line were measured with two replications each. The oil content of soybean seeds was determined with the MQC Benchtop NMR Analyser, Oxford Instruments. Six g of seeds per line were measured with two biological and three technical replicates. |
Extract Protocol: | Samples were ground in liquid nitrogen. Then rna was separated out with phenol chloroform and then purified and precipitated out using a Qiagen RNeasy Kit |
Library Construction Protocol: | RNA-seq libraries were constructed and sequenced at Expression Analysis, Inc., Durham NC (www.ExpressionAnalysis.com). RNA-seq libraries were prepared with the TruSeqTM RNA Sample Preparation Kit v2 from Illumina, Inc., San Diego, CA, and 100 bp paired-end reads were generated on the Illumina HiSeq 2000 platform. |
Molecule Type: | poly(A)+ RNA |
Library Source: | |
Library Layout: | PAIRED |
Library Strand: | - |
Platform: | ILLUMINA |
Instrument Model: | Illumina HiSeq 2000 |
Strand-Specific: | Unspecific |
Data Resource | GEN Sample ID | GEN Dataset ID | Project ID | BioProject ID | Sample ID | Sample Name | BioSample ID | Sample Accession | Experiment Accession | Release Date | Submission Date | Update Date | Species | Race | Ethnicity | Age | Age Unit | Gender | Source Name | Tissue | Cell Type | Cell Subtype | Cell Line | Disease | Disease State | Development Stage | Mutation | Phenotype | Case Detail | Control Detail | Growth Protocol | Treatment Protocol | Extract Protocol | Library Construction Protocol | Molecule Type | Library Layout | Strand-Specific | Library Strand | Spike-In | Strategy | Platform | Instrument Model | Cell Number | Reads Number | Gbases | AvgSpotLen1 | AvgSpotLen2 | Uniq Mapping Rate | Multiple Mapping Rate | Coverage Rate |
---|