Batch effects and the effective design of single-cell gene expression studies.
Po-Yuan Tung, John D Blischak, Chiaowen Joyce Hsiao, David A Knowles, Jonathan E Burnett, Jonathan K Pritchard, Yoav Gilad
Author Information
- Po-Yuan Tung: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- John D Blischak: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- Chiaowen Joyce Hsiao: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- David A Knowles: Department of Genetics, Stanford University, Stanford, CA, USA.
- Jonathan E Burnett: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
- Jonathan K Pritchard: Department of Genetics, Stanford University, Stanford, CA, USA.
- Yoav Gilad: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
Gene Expression
High-Throughput Nucleotide Sequencing
Humans
Induced Pluripotent Stem Cells
Principal Component Analysis
RNA
Sequence Analysis, RNA
Single-Cell Analysis