Batch effects and the effective design of single-cell gene expression studies.
Po-Yuan Tung, John D Blischak, Chiaowen Joyce Hsiao, David A Knowles, Jonathan E Burnett, Jonathan K Pritchard, Yoav Gilad
Author Information
Po-Yuan Tung: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
John D Blischak: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
Chiaowen Joyce Hsiao: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
David A Knowles: Department of Genetics, Stanford University, Stanford, CA, USA.
Jonathan E Burnett: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
Jonathan K Pritchard: Department of Genetics, Stanford University, Stanford, CA, USA.
Yoav Gilad: Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
中文译文
English
Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
Nucleic Acids Res. 2002 Jan 1;30(1):207-10
[PMID: 11752295 ]
Bioinformatics. 2005 May 1;21(9):2067-75
[PMID: 15657102 ]
Science. 2005 Sep 23;309(5743):2010-3
[PMID: 16179466 ]
Nature. 2005 Oct 27;437(7063):1299-320
[PMID: 16255080 ]
Nature. 2006 Jun 15;441(7095):840-6
[PMID: 16699522 ]
Bioinformatics. 2009 Aug 15;25(16):2078-9
[PMID: 19505943 ]
Stat Appl Genet Mol Biol. 2010;9:Article39
[PMID: 21044043 ]
Nucleic Acids Res. 2011 Jan;39(Database issue):D712-7
[PMID: 21071422 ]
Nat Methods. 2011 May;8(5):424-9
[PMID: 21478862 ]
Nucleic Acids Res. 2011 Jul;39(12):e81
[PMID: 21490082 ]
Genome Res. 2011 Jul;21(7):1160-7
[PMID: 21543516 ]
Proc Natl Acad Sci U S A. 2011 May 31;108(22):9026-31
[PMID: 21562209 ]
Genome Res. 2011 Sep;21(9):1543-51
[PMID: 21816910 ]
Nat Methods. 2011 Nov 20;9(1):72-4
[PMID: 22101854 ]
Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1347-52
[PMID: 22232676 ]
Nat Protoc. 2012 Apr 05;7(5):813-28
[PMID: 22481528 ]
Am J Hum Genet. 2012 Nov 2;91(5):839-48
[PMID: 23103226 ]
Nature. 2012 Nov 1;491(7422):56-65
[PMID: 23128226 ]
Nucleic Acids Res. 2013 May 1;41(10):e108
[PMID: 23558742 ]
Nature. 2013 Jun 13;498(7453):236-40
[PMID: 23685454 ]
Nat Methods. 2013 Nov;10(11):1093-5
[PMID: 24056876 ]
Psychometrika. 2013 Oct;78(4):685-709
[PMID: 24092484 ]
Mol Syst Biol. 2013 Oct 08;9:695
[PMID: 24104478 ]
Science. 2013 Nov 8;342(6159):747-9
[PMID: 24136359 ]
Nat Methods. 2014 Jan;11(1):41-6
[PMID: 24141493 ]
Bioinformatics. 2014 Apr 1;30(7):923-30
[PMID: 24227677 ]
Nat Methods. 2014 Feb;11(2):163-6
[PMID: 24363023 ]
PLoS Genet. 2014 Jan 30;10(1):e1004126
[PMID: 24497842 ]
Science. 2014 Feb 14;343(6172):776-9
[PMID: 24531970 ]
Nat Methods. 2014 Jun;11(6):637-40
[PMID: 24747814 ]
Nucleic Acids Res. 2014 Aug;42(14):8845-60
[PMID: 25053837 ]
Genome Res. 2014 Dec;24(12):2033-40
[PMID: 25079858 ]
Nat Biotechnol. 2014 Oct;32(10):1053-8
[PMID: 25086649 ]
Nat Biotechnol. 2014 Sep;32(9):896-902
[PMID: 25150836 ]
Nat Biotechnol. 2014 Sep;32(9):903-14
[PMID: 25150838 ]
Am J Hum Genet. 2015 Jan 8;96(1):70-80
[PMID: 25557783 ]
Nucleic Acids Res. 2015 Apr 20;43(7):e47
[PMID: 25605792 ]
Nat Rev Genet. 2015 Mar;16(3):133-45
[PMID: 25628217 ]
Bioinformatics. 2015 Jul 1;31(13):2225-7
[PMID: 25717193 ]
Cell. 2015 May 21;161(5):1187-1201
[PMID: 26000487 ]
Cell. 2015 May 21;161(5):1202-1214
[PMID: 26000488 ]
Genome Biol. 2015 Jun 19;16:127
[PMID: 26084335 ]
PLoS Comput Biol. 2015 Jun 24;11(6):e1004333
[PMID: 26107944 ]
Science. 2015 Sep 18;349(6254):1351-6
[PMID: 26383955 ]
Cell Stem Cell. 2015 Oct 1;17(4):471-85
[PMID: 26431182 ]
Genome Biol. 2015 Dec 10;16:278
[PMID: 26653891 ]
Hum Mol Genet. 2016 Mar 1;25(5):989-1000
[PMID: 26740550 ]
Nat Immunol. 2016 Jun;17(6):666-676
[PMID: 27043410 ]
Genome Res. 2017 Mar;27(3):491-499
[PMID: 28100584 ]
R01 HL092206/NHLBI NIH HHS
T32 GM007197/NIGMS NIH HHS
T32 HL007381/NHLBI NIH HHS
Gene Expression
High-Throughput Nucleotide Sequencing
Humans
Induced Pluripotent Stem Cells
Principal Component Analysis
RNA
Sequence Analysis, RNA
Single-Cell Analysis