Principal component analysis: a review and recent developments.

Advanced Search

Ian T Jolliffe, Jorge Cadima

Author Information

Ian T Jolliffe: College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK.
Jorge Cadima: Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL), Lisboa, Portugal jcadima@isa.ulisboa.pt.

PMID: 26953178 DOI: 10.1098/rsta.2015.0202

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.

dimension reduction eigenvectors multivariate analysis palaeontology

Nature. 2014 Aug 21;512(7514):303-5 [PMID: 25143112]

J Am Stat Assoc. 2013 Dec 19;108(504):null [PMID: 24376287]

Nat Biotechnol. 2008 Mar;26(3):303-4 [PMID: 18327243]

Ann Stat. 2013 Jun;41(3):1055-1084 [PMID: 25324581]

Biostatistics. 2009 Jul;10(3):515-34 [PMID: 19377034]

J Am Stat Assoc. 2009 Jun 1;104(486):682-693 [PMID: 20617121]

BMC Bioinformatics. 2010;11:296 [PMID: 20525176]

Journal Article Research Support, Non-U.S. Gov't Review

OpenLB
Open Library of Bioscience