Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices

Fast and memory efficient methods for truncated singular value decomposition and principal components analysis of large sparse and dense matrices.


Implicitly-restarted Lanczos methods for fast truncated singular value decomposition of sparse and dense matrices (also referred to as partial SVD). IRLBA stands for Augmented, Implicitly Restarted Lanczos Bidiagonalization Algorithm. The package provides the following functions (see help on each for details and examples).

  • irlba() partial SVD function
  • ssvd() l1-penalized matrix decompoisition for sparse PCA (based on Shen and Huang's algorithm)
  • prcomp_irlba() principal components function similar to the prcomp function in stats package for computing the first few principal components of large matrices
  • svdr() alternate partial SVD function based on randomized SVD (see also the rsvd package by N. Benjamin Erichson for an alternative implementation)
  • partial_eigen() a very limited partial eigenvalue decomposition for symmetric matrices (see the RSpectra package for more comprehensive truncated eigenvalue decomposition)

Help documentation for each function includes extensive documentation and examples. Also see the package vignette, vignette("irlba", package="irlba").

An overview web page is here: https://bwlewis.github.io/irlba/.

New in 2.3.2

What's new in Version 2.3.1?

Deprecated features

I will remove partial_eigen() in a future version. As its documentation states, users are better off using the RSpectra package for eigenvalue computations (although not generally for singular value computations).

The mult argument is deprecated and will be removed in a future version. We now recommend simply defining a custom class with a custom multiplcation operator. The example below illustrates the old and new approaches.

library(irlba)
set.seed(1)
A <- matrix(rnorm(100), 10)
 
# ------------------ old way ----------------------------------------------
# A custom matrix multiplication function that scales the columns of A
# (cf the scale option). This function scales the columns of A to unit norm.
col_scale <- sqrt(apply(A, 2, crossprod))
mult <- function(x, y)
        {
          # check if x is a  vector
          if (is.vector(x))
          {
            return((x %*% y) / col_scale)
          }
          # else x is the matrix
          x %*% (y / col_scale)
        }
irlba(A, 3, mult=mult)$d
## [1] 1.820227 1.622988 1.067185
 
# Compare with:
irlba(A, 3, scale=col_scale)$d
## [1] 1.820227 1.622988 1.067185
 
# Compare with:
svd(sweep(A, 2, col_scale, FUN=`/`))$d[1:3]
## [1] 1.820227 1.622988 1.067185
 
# ------------------ new way ----------------------------------------------
setClass("scaled_matrix", contains="matrix", slots=c(scale="numeric"))
setMethod("%*%", signature(x="scaled_matrix", y="numeric"), function(x ,y) [email protected] %*% (y / [email protected]))
setMethod("%*%", signature(x="numeric", y="scaled_matrix"), function(x ,y) (x %*% [email protected]) / [email protected])
a <- new("scaled_matrix", A, scale=col_scale)
 
irlba(a, 3)$d
## [1] 1.820227 1.622988 1.067185

We have learned that using R's existing S4 system is simpler, easier, and more flexible than using custom arguments with idiosyncratic syntax and behavior. We've even used the new approach to implement distributed parallel matrix products for very large problems with amazingly little code.

Wishlist / help wanted...

  • More Matrix classes supported in the fast code path
  • Help improving the solver for singular values in tricky cases (basically, for ill-conditioned problems and especially for the smallest singular values); in general this may require a combination of more careful convergence criteria and use of harmonic Ritz values; Dmitriy Selivanov has proposed alternative convergence criteria in https://github.com/bwlewis/irlba/issues/29 for example.

References

  • Baglama, James, and Lothar Reichel. "Augmented implicitly restarted Lanczos bidiagonalization methods." SIAM Journal on Scientific Computing 27.1 (2005): 19-42.
  • Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions." (2009).
  • Shen, Haipeng, and Jianhua Z. Huang. "Sparse principal component analysis via regularized low rank matrix approximation." Journal of multivariate analysis 99.6 (2008): 1015-1034.
  • Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis." Biostatistics 10.3 (2009): 515-534.

Status

Travis CI status Codecov

News

Version 1.0.2 (2012/7/7): Corrected minor documentation bug, also nu, nv options now honored when nu != nv. Version 1.0.1 (2011/11/11): Added NAMESPACE that imports correctly from Matrix.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("irlba")

2.3.3 by B. W. Lewis, 4 months ago


Report a bug at https://github.com/bwlewis/irlba/issues


Browse source code at https://github.com/cran/irlba


Authors: Jim Baglama [aut, cph] , Lothar Reichel [aut, cph] , B. W. Lewis [aut, cre, cph]


Documentation:   PDF Manual  


Task views: Numerical Mathematics


GPL-3 license


Imports stats, methods

Depends on Matrix

Linking to Matrix


Imported by ERP, MFPCA, OmicKriging, RaceID, Seurat, denoiseR, dyndimred, fuser, gyriq, jackstraw, randnet, recommenderlab, text2vec, uwot.

Depended on by DDRTree, s4vd.

Suggested by ChemoSpec, ChemoSpec2D, ChemoSpecUtils, DrImpute, Rtsne, broom, metR, sctransform, steadyICA, widyr.


See at CRAN