Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations

https://doi.org/10.1016/j.jmva.2011.09.002Get rights and content
Under an Elsevier user license
open archive

Abstract

In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a ρ-mixing-type dependency appears in variables or not. When the ρ-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.

AMS subject classifications

primary
62H25
62H30
secondary
34L20

Keywords

Consistency
Discriminant analysis
Eigenvalue distribution
Geometric representation
HDLSS
Inverse matrix
Noise reduction
Principal component analysis

Cited by (0)