Implement frequent-directions algorithm for efficient matrix sketching.
(Edo Liberty (2013)
Implementation of Frequent-Directions algorithm for efficient matrix sketching [E. Liberty, SIGKDD2013]
# Not yet onCRANinstall.packages("frequentdirections")# Or the development version from GitHub:install.packages("devtools")devtools::install_github("shinichi-takayanagi/frequentdirections")
Here, we use Handwritten digits USPS
dataset as
sample data. In the following example, we assume that you save the above
sample data into /tmp
directory.
The dataset has 7291 train and 2007 test images in h5
format. The
images are 16*16 grayscale pixels.
library("h5")file <- h5file("/tmp/usps.h5")x <- file["train/data"][]y <- file["train/target"][]str(x)#> num [1:7291, 1:256] 0 0 0 0 0 0 0 0 0 0 ...
Example the number 8
image(matrix(x[338,], nrow=16, byrow = FALSE))
Plot the original data on the first and second singular vector plane.
x <- scale(x)frequentdirections::plot_svd(x, y)
eps <- 10^(-8)# 7291 x 256 -> 8 * 256 matrixb <- frequentdirections::sketching(x, 8, eps)frequentdirections::plot_svd(x, y, b)
# 7291 x 256 -> 32 * 256 matrixb <- frequentdirections::sketching(x, 32, eps)frequentdirections::plot_svd(x, y, b)
# 7291 x 256 -> 128 * 256 matrixb <- frequentdirections::sketching(x, 128, eps)frequentdirections::plot_svd(x, y, b)
This result is almost the same with the original data SVD expression.
That’s why we can think that the original data is expressed with only
128
rows.