Implements many algorithms for statistical learning on
sparse matrices - matrix factorizations, matrix completion,
elastic net regressions, factorization machines.
Also 'rsparse' enhances 'Matrix' package by providing methods for
multithreaded

`rsparse`

is an R package for statistical learning primarily on **sparse matrices** - **matrix factorizations, factorization machines, out-of-core regression**. Many of the implemented algorithms are particularly useful for **recommender systems** and **NLP**.

On top of that we provide some optimized routines to work on sparse matrices - multithreaded <dense, sparse> matrix multiplications and improved support for sparse matrices in CSR format (`Matrix::RsparseMatrix`

).

We've paid some attention to the implementation details - we try to avoid data copies, utilize multiple threads via OpenMP and use SIMD where appropriate. Package **allows to work on datasets with millions of rows and millions of columns**.

Please reach us if you need **commercial support** - [email protected].

- Follow the proximally-regularized leader which llows to solve
**very large linear/logistic regression**problems with elastic-net penalty. Solver use with stochastic gradient descend with adaptive learning rates (so can be used for online learning - not necessary to load all data to RAM). See Ad Click Prediction: a View from the Trenches for more examples.- Only logistic regerssion implemented at the moment
- Native format for matrices is CSR -
`Matrix::RsparseMatrix`

. However common R`Matrix::CpasrseMatrix`

(`dgCMatrix`

) will be converted automatically.

- Factorization Machines supervised learning algorithm which learns second order polynomial interactions in a factorized way. We provide highly optimized SIMD accelerated implementation.

- Vanilla
**Maximum Margin Matrix Factorization**- classic approch for "rating" prediction. See`WRMF`

class and constructor option`feedback = "explicit"`

. Original paper which indroduced MMMF could be found here. **Weighted Regularized Matrix Factorization (WRMF)**from Collaborative Filtering for Implicit Feedback Datasets. See`WRMF`

class and constructor option`feedback = "implicit"`

. We provide 2 solvers:- Exact based of Cholesky Factorization
- Approximated based on fixed number of steps of
**Conjugate Gradient**. See details in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering and Faster Implicit Matrix Factorization.

**Linear-Flow**from Practical Linear Models for Large-Scale One-Class Collaborative Filtering. Algorithm looks for factorized low-rank item-item similarity matrix (in some sense it is similar to SLIM)- Fast
**Truncated SVD**and**Truncated Soft-SVD**via Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. Works for both sparse and dense matrices. Works on float matrices as well! For certain problems may be even faster than irlba package. **Soft-Impute**via fast Alternating Least Squares as described in Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.- with a solution in SVD form

**GloVe**as described in GloVe: Global Vectors for Word Representation.- This is usually used to train word embeddings, but actually also very useful for recommender systems.

- multithreaded
`%*%`

and`tcrossprod()`

for`<dgRMatrix, matrix>`

- multithreaded
`%*%`

and`crossprod()`

for`<matrix, dgCMatrix>`

- natively slice
`CSR`

matrices (`Matrix::RsparseMatrix`

) without converting them to triplet / CSC

Most of the algorithms benefit from OpenMP and many of them could utilize high-performance implementation of BLAS. If you want make maximum out of the package please read the section below carefuly.

It is recommended to:

- Use high-performance BLAS (such as OpenBLAS, MKL, Apple Accelerate).
- Add proper compiler optimizations in your
`~/.R/Makevars`

. For example on recent processors (with AVX support) and complier with OpenMP support following lines could be a good option:CXX11FLAGS += -O3 -march=native -mavx -fopenmp -ffast-mathCXXFLAGS += -O3 -march=native -mavx -fopenmp -ffast-math

If you are on **Mac** follow instructions here. After installation of `clang4`

additionally put `PKG_CXXFLAGS += -DARMA_USE_OPENMP`

line to your `~/.R/Makevars`

. After that install `rsparse`

in a usual way.

**Note that syntax is these posts/slides is not up to date since package was under active development**

- Slides from DataFest Tbilisi(2017-11-16)
- Introduction to matrix factorization with Weighted-ALS algorithm - collaborative filtering for implicit feedback datasets.
- Music recommendations using LastFM-360K dataset
- evaluation metrics for ranking
- setting up proper cross-validation
- possible issues with nested parallelism and thread contention
- making recommendations for new users
- complimentary item-to-item recommendations

- Benchmark against other good implementations

Here is example of `rsparse::WRMF`

on lastfm360k dataset in comparison with other good implementations:

We follow mlapi conventions.

Generate configure:

`autoconf configure.ac > configure && chmod +x configure`

- 2019-04-14
- fixed out of bound memory access as reported by CRAN UBSAN
- added ability to init GloVe embeddings with user provided values

- 2019-03-16
- added methods to natively slice CSR matrices without converting them to triplet/CSC

- 2018-10-25
- add GloVe matrix factorization (adapted from
`text2vec`

) - link to
`float`

package - credits to @snoweye and @wrathematics

- add GloVe matrix factorization (adapted from