Binary Dimensionality Reduction

Dimensionality reduction techniques for binary data including logistic PCA.

Build Status CRAN_Status_Badge

logisticPCA is an R package for dimensionality reduction of binary data. Please note that it is still in the very early stages of development and the conventions will possibly change in the future. A manuscript describing logistic PCA can be found here.

logisticPCA projection


To install R, visit

The package can be installed by downloading from CRAN.


To install the development version, first install devtools from CRAN. Then run the following commands.

# install.packages("devtools")


Three types of dimensionality reduction are given. For all the functions, the user must supply the desired dimension k. The data must be an n x d matrix comprised of binary variables (i.e. all 0's and 1's).

Logistic PCA

logisticPCA() estimates the natural parameters of a Bernoulli distribution in a lower dimensional space. This is done by projecting the natural parameters from the saturated model. A rank-k projection matrix, or equivalently a d x k orthogonal matrix U, is solved for to minimize the Bernoulli deviance. Since the natural parameters from the saturated model are either negative or positive infinity, an additional tuning parameter m is needed to approximate them. You can use cv.lpca() to select m by cross validation. Typical values are in the range of 3 to 10.

mu is a main effects vector of length d and U is the d x k loadings matrix.

Logistic SVD

logisticSVD() estimates the natural parameters by a matrix factorization. mu is a main effects vector of length d, B is the d x k loadings matrix, and A is the n x k principal component score matrix.

Convex Logistic PCA

convexLogisticPCA() relaxes the problem of solving for a projection matrix to solving for a matrix in the k-dimensional Fantope, which is the convex hull of rank-k projection matrices. This has the advantage that the global minimum can be obtained efficiently. The disadvantage is that the k-dimensional Fantope solution may have a rank much larger than k, which reduces interpretability. It is also necessary to specify m in this function.

mu is a main effects vector of length d, H is the d x d Fantope matrix, and U is the d x k loadings matrix, which are the first k eigenvectors of H.


Each of the classes has associated methods to make data analysis easier.

  • print(): Prints a summary of the fitted model.
  • fitted(): Fits the low dimensional matrix of either natural parameters or probabilities.
  • predict(): Predicts the PCs on new data. Can also predict the low dimensional matrix of natural parameters or probabilities on new data.
  • plot(): Either plots the deviance trace, the first two PC loadings, or the first two PC scores using the package ggplot2.

In addition, there are functions for performing cross validation.

  • cv.lpca(), cv.lsvd(), cv.clpca(): Run cross validation over the rows of the matrix to assess the fit of m and/or k.
  • Plots the results of the cv() method.


logisticPCA 0.2

  • Changed M to m in the functions, since that is what it is called in the references
  • Switched to the rARPACK package from irlba for partial eigen and singular value decomposition
  • Fixed incompatibility with updated version of testthat

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.2 by Andrew J. Landgraf, 5 years ago

Browse source code at

Authors: Andrew J. Landgraf

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports ggplot2

Suggests rARPACK, testthat, knitr, rmarkdown

Suggested by glmpca.

See at CRAN