Multivariate Data Analysis for Chemometrics

Projection based methods for preprocessing, exploring and analysis of multivariate data used in chemometrics. S. Kucheryavskiy (2020) .

mdatools is an R package for preprocessing, exploring and analysis of multivariate data. The package provides methods mostly common for Chemometrics. It was created for an introductory PhD course on Chemometrics given at Section of Chemical Engineering, Aalborg University.

The general idea of the package is to collect most widespread chemometric methods and give a similar "user interface" for using them. So if a user knows how to make a model and visualise results for one method, he or she can easily do this for the others.

For more details and examples read a Bookdown tutorial.

What is new

New minor release (0.9.1) is available both from GitHub and CRAN (from 07.07.2018).

The latest major release (0.9.0) brings a set of new features, including methods for computing of critical limits for PCA/SIMCA residuals, adjuested residuals plot, and randomized algorithms for fast PCA decomposition of dataset with large number of rows. The text of tutorial has been amended correspondingly and now also includes a new chapter with detailed explanation of calculation of the critical limits.

A full list of changes is available here

How to install

The package is available from CRAN by usual installing procedure. However due to restrictions in CRAN politics regarding number of submissions (one in 3-4 month) only major releases will be published there (with 2-3 weeks delay after GitHub release as more thorought testing is needed). To get the latest release plase use GitHub sources. You can download a zip-file with source package and install it using the install.packages command, e.g. if the downloaded file is mdatools_0.9.1.tar.gz and it is located in a current working directory, just run the following:


If you have devtools package installed, the following command will install the latest release from the master branch of GitHub repository (do not forget to load the devtools package first):




  • all plot functions have new opacity parameter for semi-transparent colors
  • several improvements to PLS-DA method for one-class discrimination
  • fixed a bug with wrong estimation of maximum number of components for PCA/SIMCA with cross-validation
  • added chapter on PLS-DA to the tutorial (including last improvements)


  • added randomized PCA algorithm (efficient for datasets with large number of rows)
  • added option to inherit and show critical limits on residuals plot for PCA/SIMCA results
  • added support for data driven approach to PCA/SIMCA (DD-SIMCA)
  • added calculation of class belongings probability for SIMCA results
  • added plotExtreme() method for SIMCA models
  • added setResLimits() method for PCA/SIMCA models
  • added plotProbabilities() method for SIMCA results
  • added getConfusionMatrix() method for classification results
  • added option to show prediction statistics using plotPrediction() for PLS results
  • added option to use equal axes limits in plotPrediction() for PLS results
  • the tutorial has been amended and extended correspondingly


  • small improvements to calculation of statistics for regression coefficients
  • pls.getRegCoeffs() now also returns standard error and confidence intervals calculated for unstandardized variables
  • new method summary() for object with regression coefficients (regcoeffs)
  • fixed a bug with double labels on regression coefficients plot with confidence intervals
  • fixed a bug in some PLS plots where labels for cross-validated results forced to be numbers
  • when using mdaplot for data frame with one or more factor columns, the factors are now transofrmed to dummy variables (before it led to an error)


  • fixed a bug in mdaplots when using factor with more than 8 levels for color grouping led to an error
  • fixed a bug in pca with wrong calculation of eigenvalues in NIPALS algorithm
  • bars on a bar plot now can be color grouped


  • parameters lab.cex and lab.col now are also applied to colorbar labels


  • fixed a bug in PCA when explained variance was calculated incorrectly for data with excluded rows
  • fixed several issues with SIMCA (cross-validation) and SIMCAM (Cooman's plot)
  • added a chapter about SIMCA to the tutorial


  • tutorial has been moved from GitBook to Bookdown and fully rewritten
  • GitHub repo for the package has the tutorial as a static html site in docs folder
  • the mdaplot() and mdaplotg() were rewritten completely and now are more easy to use (check tutorial)
  • new color scheme 'jet' with jet colors
  • new plot type ('d') for density scatter plot
  • support for xlas and ylas in plots to rotate axis ticks
  • support for several data attributes to give extra functionality for plots (including manual x-values for line plots)
  • rows and columns can be now hidden/excluded via attributes
  • factor columns of data frames are now converted to dummy variables automatically when model is created/applied
  • scores and loadings plots show % of explained variance in axis labels
  • biplot is now available for PCA models (plotBiplot())
  • scores plot for PCA model can be now also shown with color grouping (cgroup) if no there is no test set
  • cross-validation in PCA and PLS has been improved to make it faster
  • added a posibility to exclude selected rows and columns from calculations
  • added support for images (check tutorial)


  • corrected a typo in title of selectivity ratio plot
  • prep.autoscale() now do not scale columns with coefficient of variation below given threshold


  • fixed an issue lead to error in some cases
  • documentation was regenerated with new version of Roxygen
  • file People.RData was renamed to people.RData
  • NIPALS method for PCA has been added
  • code optimization to speed calculations up


  • interval PLS variable selection (iPLS) is implemented
  • normalization was added to preprocessing methods (prep.norm)
  • method getRegcoeffs was added to PLS model
  • automatic selection of optimal components in PLS (Wold's criterion and first local min)
  • parameter cgroup for plots now can work with factors correctly (including ones with text levels)
  • all documentation was converted to roxygen2 format
  • NAMESPACE file is generated by roxygen2
  • fixed several small bugs and typos


  • Q2 residuals renamed to Q (Squared residual distance)
  • All plots have parameters lab.col and lab.cex for changing color and font size for data point labels


  • fixed a bug led to incorrect calculation of specificity
  • jack-knife confidence intervals now also calculated for PLS-DA models
  • confidence intervals for regression coefficients are shown by default if calculated


  • randomization test for PLS has been added, see ?randtest
  • systematic and repeated random cross-validation are available, see ?crossval
  • fixed bug with labels on bar plot with confidence intervals
  • fixed bug in PLS when using maximum number of components lead to NA values in weights

v. 0.5.3

  • fixed several small bugs
  • improvemed documentation for basic methods

v. 0.5.2

  • fixed bug for computing classification performance for numeric class names
  • improvements to SIMCA implementation

v. 0.5.1

  • added more details to documentation
  • bug fixes for variable selection methods

v. 0.5.0

  • all documentation has been rewritten using roxygen2 package
  • added extra preprocessing methods
  • added VIP scores calculation and plot for PLS and PLS-DA models
  • added Selectivity ratio calculation and plot for PLS and PLS-DA models
  • added calculation of confidence intervals for PLS regression coefficient using jack-knife
  • bug fixes and small improvements
  • the first release available in CRAN

v. 0.4.0

  • New classres class for representation and visualisation of classification results
  • in PCA model, limits for T2 and Q2 now are calculated for all available components
  • in PCA results, limits for T2 and Q2 calculated for a model are kept and shown on residuals plot
  • added parameters xticklabels and yticklabels to mdaplot and mdaplotg functions
  • New simca and simcares classes for one-class SIMCA model and results
  • New simcam and simcamres classes for multiclass SIMCA model and results
  • New plsdaand plsdaresclasses for PLS-DA model and results
  • bug fixes and improvements

v. 0.3.2

  • Enhancements in group bar plot
  • Fixed bugs with wrong labels of bar plot with negative values

v. 0.3.1

  • Corrected errors and typos in and small bg fixes

v. 0.3.0

  • PLS and all related methods were rewritten from the scratch to make them faster, more efficient and also to follow the same code conventions as previously rewritten PCA. Here are main changes you need to do in your code if you used mdatools PLS before: selectNumComp(model, ncomp) instead of pls.selectncomp(model, ncomp), test.x ad test.y instead of Xt and yt, finally separate logical arguments center and scale are used instead of previously used autoscale. By default scale = F and center = T.
  • PLS and all related methods are now well documented (see ?pls)
  • plotting tools for all classes and methods were rewritten completely. Now all plotting methods use either mdaplot or mdaplotg functions, which extend basic functionality of R plots. For example, they allow to make color groups and colorbar legend, calculate limits automatically depending on elements on a plot, make automatic legend and many other things.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.12.0 by Sergey Kucheryavskiy, 3 months ago

Browse source code at

Authors: Sergey Kucheryavskiy

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports methods, graphics, grDevices, stats, Matrix

Suggests testthat

See at CRAN