Summary: Single cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However; the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single cell Fluidigm C1 platform. To do so; we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples; to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype; but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation; indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results; we suggest a framework for effective scRNA-seq studies.
Overall Design: We combined the 96 single cell samples from each C1 chip into their own master mix and sequenced across three lanes of a HiSeq 2500 (3 individuals x 3 replicates x 96 wells x 3 lanes = 2592 files). We prepared two separate library preparations for each bulk sample; combined them all into one master mix; and sequenced across four lanes (3 individuals x 3 replicates x 2 library preparations x 4 lanes = 72 files).
Strategy: |
|
Species: |
|
Healthy Condition: |
|
Cell Type: |
|
Growth Protocol: | Undifferentiated feeder-free iPSCs generated from Yoruba LCLs were grown in E8 medium (Life Tech) (G. Chen et al. 2011) on Matrigel-coated tissue culture plates with daily media feeding at 37 °C with 5% (vol/col) CO2. For standard maintenance, cells were split every 3-4 days using cell release solution (0.5 mM EDTA and NaCl in PBS) at the confluence of roughly 80%. For the single cell suspension, iPSCs were individualized by Accutase Cell Detachment Solution (BD) for 5-7 minutes at 37 °C and washed twice with E8 media immediately before each experiment. Cell viability and cell counts were then measured by the Automated Cell Counter (Bio-Rad) to generate resuspension densities of 2.5 X 105 cells/mL in E8 medium for C1 cell capture.; Undifferentiated feeder-free iPSCs generated from Yoruba LCLs were grown in E8 medium (Life Tech) (G. Chen et al. 2011) on Matrigel-coated tissue culture plates with daily media feeding at 37 °C with 5% (vol/col) CO2. For standard maintenance, cell were split every 3-4 days using cell release solution (0.5 mM EDTA and NaCl in PBS) at the confluence of roughly 80%. For the single cell suspension, iPSCs were individualized by Accutase Cell Detachment Solution (BD) for 5-7 minutes at 37 °C and washed twice with E8 media immediately before each experiment. Cell viability and cell counts were then measured by the Automated Cell Counter (Bio-Rad) to generate resuspension densities of 2.5 X 105 cell/mL in E8 medium for C1 cell capture. |
Treatment Protocol: | - |
Extract Protocol: | Single cell loading and capture was performed following the Fluidigm manual; A bulk sample, a 40 ul aliquot of ~10,000 cell, was collected in parallel with each C1 chip using the same reaction mixes following the C1 protocol of ""Tube Controls with Purified RNA |
Library Construction Protocol: | For sequencing library preparation, fragmentation and isolation of 5^ fragments were performed according to the UMI protocol (Islam et al. 2014). Instead of using commercial available Tn5 transposase, Tn5 protein stock was freshly purified in house using the IMPACT system (pTXB1, NEB) following the protocol previously described (Picelli et al. 2014). The activity of Tn5 was tested and shown to be comparable with the EZ-Tn5-Transposase (Epicentre). Importantly, all the libraries in this study were generated using the same batch of Tn5 protein purification. For each of the bulk samples, two libraries were generated using two different indices in order to get sufficient material. |
Molecule Type: | poly(A)+ RNA |
Library Source: | |
Library Layout: | SINGLE |
Library Strand: | Reverse |
Platform: | ILLUMINA |
Instrument Model: | Illumina HiSeq 2500 |
Strand-Specific: | Specific |
Data Resource | GEN Sample ID | GEN Dataset ID | Project ID | BioProject ID | Sample ID | Sample Name | BioSample ID | Sample Accession | Experiment Accession | Release Date | Submission Date | Update Date | Species | Race | Ethnicity | Age | Age Unit | Gender | Source Name | Tissue | Cell Type | Cell Subtype | Cell Line | Disease | Disease State | Development Stage | Mutation | Phenotype | Case Detail | Control Detail | Growth Protocol | Treatment Protocol | Extract Protocol | Library Construction Protocol | Molecule Type | Library Layout | Strand-Specific | Library Strand | Spike-In | Strategy | Platform | Instrument Model | Cell Number | Reads Number | Gbases | AvgSpotLen1 | AvgSpotLen2 | Uniq Mapping Rate | Multiple Mapping Rate | Coverage Rate |
---|