Cauchy Robust Principal Component Analysis with Applications to High-Dimensional Data Sets

Cauchy Robust Principal Component Analysis with Applications to High-Dimensional Data Sets

In this paper, we propose a modified formulation of the principal components analysis, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties.

Views: 218

In the analysis of multivariate data, it is frequently desirable to employ statistical methods which are insensitive to the presence of outliers in the sample. To address the problem of outliers, it is important to develop robust statistical procedures. Most statistical procedures include explicit or implicit prior assumptions about the distribution of the observations, but often without taking into account the effect of outliers. The purpose of this paper is to present a novel robust version of PCA which has some attractive features. 

Principal components analysis (PCA) is considered to be one of the most important techniques in statistics. However, the classical version of PCA depends on either a covariance or a correlation matrix, both of which are very sensitive to outliers. We develop an alternative method to classical PCA, which is far more robust, by using a multivariate Cauchy likelihood to construct a robust principal components (PC) procedure. It is an adaptation of the classic method of PCA obtained by replacing the Gaussian log-likelihood function by the Cauchy log-likelihood function, in a sense that will be explained in section 2.2. Although we do not claim that the interpretation of standard PCA in terms of operations on a Gaussian likelihood is new, see Bolton and Krzanowski, this fact does not appear to have been exploited in the development of a robust PCA procedure, as we do in this paper. An important reason for using the multivariate Cauchy likelihood is that this likelihood has only one maximum point, but the single most important motivation is that it leads to a robust procedure.

See also

Department Of Economics Website

myEcon Newsletter

Join the notification list of the Department of Economics.