Implementation of sparse linear discriminant analysis, which is a supervised
classification method for multiple classes. Various novel optimization approaches to
this problem are implemented including alternating direction method of multipliers (ADMM),
proximal gradient (PG) and accelerated proximal gradient (APG) (See Atkins et al.
This is the R
-package accompanying the paper Proximal Methods for Sparse Optimal Scoring and Discriminant Analysis.
This package is currently under development, although most of the functionality is there already! You can now do sparse discriminant analysis with the package, but the visualization tools are being implemented and tested.
Do you have a data set with a lot of variables and few samples? Do you have labels for the data?
Then you might be trying to solve an p>>n classification task.
This package includes functions that allow you to train such a classifier in a sparse manner. In this context sparse means that only the best variables are selected for the final classifier. In this sense you can also interpret the output, i.e. use it to identify important variables for your classification task. The current functions also handle cross-validation for tuning the sparsity, look at the documentation for further description/examples.
You can install the package from CRAN or for the development version, you can install directly from github.
To install packages from github you need the devtools
package. So install that if you haven't gotten it already!
Now you can proceed to install the package:
library(devtools)install_github("gumeo/accSDA")library(accSDA)
And now you can start playing around with the package!
The following is an example on how one could use the package on Fisher's Iris dataset. I choose the Iris dataset because most people are familiar with it. Other examples with p>>n examples will arive later!
# Prepare training and test settrain <- c(1:40,51:90,101:140)Xtrain <- iris[train,1:4] # normalize is a function in the packagenX <- normalize(Xtrain)Xtrain <- nX$XcYtrain <- iris[train,5]Xtest <- iris[-train,1:4]Xtest <- normalizetest(Xtest,nX)Ytest <- iris[-train,5] # Define parameters for SDAD, i.e. ADMM optimization method# Also try the SDAP and SDAAP methods, look at the documentation# to read more about the parameters!Om <- diag(4)+0.1*matrix(1,4,4) #elNet coef matgam <- 0.01lam <- 0.01method <- "SDAD"q <- 2control <- list(PGsteps = 100, PGtol = c(1e-5,1e-5), mu = 1, maxits = 100, tol = 1e-3, quiet = FALSE) # Run the algorithmres <- ASDA(Xt = Xtrain, Yt = Ytrain, Om = Om, gam = gam , lam = lam, q = q, method = method, control = control) # Can also just use the defaults:# Default optimization method is SDAAP, accelerated proximal gradient.resDef <- ASDA(Xtrain,Ytrain)
Now that you have gotten some results, you want to test the performance on the test set! What comes out of the ASDA
function is an S3 object of class ASDA
and there is a predict method in the package to predict the outcome of the classifier on new data!
preds <- predict(res, newdata = Xtest)
Coming releases will include more plotting and printing functionality for the ASDA
objects. A C++ backend is also in the pipeline along with some further extensions to handle different types of data.
This is the first CRAN release of the package, see further information of the content in the README file and the accompanying paper (https://arxiv.org/pdf/1705.07194.pdf). You will find benchmarks and more convincing datasets to test the code in the paper, i.e. where we have a lot more variables compared to samples.
This version of the package contains implementation of novel optimization approaches to solve the sparse optimal scoring problem. Future releases will focus on further improvements and additional tools to work with the results.
If you have any questions please send a mail to: ([email protected])