Two methods for performing a constrained principal
component analysis (PCA), where non-negativity and/or sparsity
constraints are enforced on the principal axes (PAs). The
function 'nsprcomp' computes one principal component (PC) after
the other. Each PA is optimized such that the corresponding PC
has maximum additional variance not explained by the previous
components. In contrast, the function 'nscumcomp' jointly
computes all PCs such that the cumulative variance is maximal.
Both functions have the same interface as the 'prcomp' function
from the 'stats' package (plus some extra parameters), and both
return the result of the analysis as an object of class
'nsprcomp', which inherits from 'prcomp'. See
and Sigg et al. (2008)
An R package for non-negative and sparse principal component analysis.
Principal component analysis (PCA) provides a lower dimensional approximation of high-dimensional data, where the reconstruction error (measured using Euclidean distance) is minimal. PCA is therefore useful to succinctly describe a data set which has an "intrinsic" dimensionality that is low, even though the number of features might be high. A classical example is the set of face images: even low-resolution images quickly exceed 1e4 pixels, yet about 50 principal components (PCs) are sufficient to achieve a good approximation of a face image (Kirby and Sirovich, 1990).
Although PCA often provides a good approximation with few PCs, each component is usually a linear combination of all features of the data. Enforcing sparsity of the principal axes (PA) facilitates identification of the relevant features and is therefore an unsupervised feature selection method. In applications where a fixed penalty is associated with each included feature (e.g. transaction costs in finance), a small loss in variance of the PC for a large reduction in cardinality of the PA can lead to an overall more desirable solution. Enforcing non-negativity of the PAs renders PCA applicable to domains where only positive influence of features is deemed appropriate, i.e. the total variance is explained additively. Non-negative solutions often show some degree of sparsity already, but a combination of both constraints enables precise control over the cardinality of the PAs.
This package implements two non-negative and/or sparse PCA algorithms
which are rooted in expectation-maximization (EM) for a
probabilistic generative model of PCA (Sigg and Buhmann, 2008). The
algorithm can also be described as applying a soft-thresholding operator to the well-known
power iteration method for computing eigenvalues.
In each EM (or power) iteration of the
nsprcomp algorithm, a soft thresholding
operator is applied to enforce sparsity of the PA, and projection to
the non-negative orthant is applied to enforce non-negativity. Once
the EM procedure has converged and the support of the PA has been
identified, the non-zero coefficients are recomputed to maximize the
variance of the PC. Finally, because EM is a local optimizer, random
restarts are employed to (hopefully) avoid bad local minima. The same
iterative procedure (without the non-negativity constraint) was later also
proposed by Journee et al. (2010).
nsprcomp algorithm computes one PC after the other. Because
constrained PAs no longer correspond to true eigenvectors of the
covariance matrix and are usually not pairwise orthogonal, special
attention needs to be paid when computing more than a single PC. The
algorithm implements the generalized deflation method proposed by
Mackey (2009) to maximize the additional variance of each
component. The variance of the PC is maximized after projecting the PA
to the ortho-complement of the space spanned by the previous PAs. This procedure maximizes the
additional variance not explained by previous components, and is
identical to standard PCA if no sparsity or non-negativity constraints
are enforced on the PAs.
nsprcomp algorithm is suitable for large and high-dimensional
data sets, because it entirely avoids computing the covariance
matrix. It is therefore especially suited to the case where the number of
features exceeds the number of observations.
nscumcomp algorithm jointly computes all PCs, such that the
cumulative variance of the components is maximized. The computation is
again based on EM iterations, but here the maximization step has to be
carried out numerically. Non-negativity of the PAs is achieved by
enforcing a zero lower bound in the L-BFGS-B algorithm used for
performing the maximization step. Sparsity of the PAs is achieved by a
subsequent soft-thresholding of the pseudo-rotation matrix (the matrix
which has the PAs as its columns). In contrast to sequential PCA
algorithms such as
nsprcomp, only the total number
k of non-zero
loadings is specified by the user, and the non-zero loadings are
distributed over all PAs by the algorithm.
The algorithm penalizes the divergence of the pseudo-rotation matrix
from ortho-normality using a Lagrange parameter
gamma. A positive
gamma enforces a minimum angle between the PAs, and is sometimes
necessary to avoid PAs collapsing onto each other during the EM
nscumcomp algorithm also scales to large and high-dimensional
data sets. But due to the numerical maximization of the
M-step, it is computationally more involved than the
nscumcomp algorithm is currently unpublished.
Kirby, M., & Sirovich, L. (1990). Application of the Karhunen-Loeve procedure for the characterization of human faces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 12(1), 103-108.
Sigg, C. D., & Buhmann, J. M. (2008). Expectation-maximization for sparse and non-negative PCA. In Proceedings of the 25th International Conference on Machine Learning (pp. 960-967).
Journee, M., Nesterov, Y., Richtarik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517-553.
Mackey, L. (2009). Deflation methods for sparse pca. Advances in Neural Information Processing Systems, 21, 1017-1024.