Projection based methods for preprocessing,
exploring and analysis of multivariate data used in chemometrics.
S. Kucheryavskiy (2020)
mdatools is an R package for preprocessing, exploring and analysis of multivariate data. The package provides methods mostly common for Chemometrics. It was created for an introductory PhD course on Chemometrics given at Section of Chemical Engineering, Aalborg University.
The general idea of the package is to collect most widespread chemometric methods and give a similar "user interface" for using them. So if a user knows how to make a model and visualise results for one method, he or she can easily do this for the others.
For more details and examples read a Bookdown tutorial.
New minor release (0.9.1) is available both from GitHub and CRAN (from 07.07.2018).
The latest major release (0.9.0) brings a set of new features, including methods for computing of critical limits for PCA/SIMCA residuals, adjuested residuals plot, and randomized algorithms for fast PCA decomposition of dataset with large number of rows. The text of tutorial has been amended correspondingly and now also includes a new chapter with detailed explanation of calculation of the critical limits.
A full list of changes is available here
The package is available from CRAN by usual installing procedure. However due to restrictions in CRAN politics regarding number of submissions (one in 3-4 month) only major releases will be published there (with 2-3 weeks delay after GitHub release as more thorought testing is needed). To get the latest release plase use GitHub sources. You can download a zip-file with source package and install it using the
install.packages command, e.g. if the downloaded file is
mdatools_0.9.1.tar.gz and it is located in a current working directory, just run the following:
If you have
devtools package installed, the following command will install the latest release from
the master branch of GitHub repository (do not forget to load the
devtools package first):
opacityparameter for semi-transparent colors
plotExtreme()method for SIMCA models
setResLimits()method for PCA/SIMCA models
plotProbabilities()method for SIMCA results
getConfusionMatrix()method for classification results
plotPrediction()for PLS results
plotPrediction()for PLS results
pls.getRegCoeffs()now also returns standard error and confidence intervals calculated for unstandardized variables
summary()for object with regression coefficients (
mdaplotfor data frame with one or more factor columns, the factors are now transofrmed to dummy variables (before it led to an error)
mdaplotswhen using factor with more than 8 levels for color grouping led to an error
pcawith wrong calculation of eigenvalues in NIPALS algorithm
lab.colnow are also applied to colorbar labels
mdaplotg()were rewritten completely and now are more easy to use (check tutorial)
'd') for density scatter plot
ylasin plots to rotate axis ticks
cgroup) if no there is no test set
prep.autoscale()now do not scale columns with coefficient of variation below given threshold
getRegcoeffswas added to PLS model
cgroupfor plots now can work with factors correctly (including ones with text levels)
lab.cexfor changing color and font size for data point labels
classresclass for representation and visualisation of classification results
simcaresclasses for one-class SIMCA model and results
simcamresclasses for multiclass SIMCA model and results
plsdaresclasses for PLS-DA model and results
selectNumComp(model, ncomp)instead of
yt, finally separate logical arguments
scaleare used instead of previously used
autoscale. By default
scale = Fand
center = T.
mdaplotgfunctions, which extend basic functionality of R plots. For example, they allow to make color groups and colorbar legend, calculate limits automatically depending on elements on a plot, make automatic legend and many other things.