Projection based methods for preprocessing,
exploring and analysis of multivariate data used in chemometrics.
S. Kucheryavskiy (2020)

mdatools is an R package for preprocessing, exploring and analysis of multivariate data. The package provides methods mostly common for Chemometrics. It was created for an introductory PhD course on Chemometrics given at Section of Chemical Engineering, Aalborg University.

The general idea of the package is to collect most widespread chemometric methods and give a similar "user interface" for using them. So if a user knows how to make a model and visualise results for one method, he or she can easily do this for the others.

For more details and examples read a Bookdown tutorial.

New minor release (0.9.1) is available both from GitHub and CRAN (from 07.07.2018).

The latest major release (0.9.0) brings a set of new features, including methods for computing of critical limits for PCA/SIMCA residuals, adjuested residuals plot, and randomized algorithms for fast PCA decomposition of dataset with large number of rows. The text of tutorial has been amended correspondingly and now also includes a new chapter with detailed explanation of calculation of the critical limits.

A full list of changes is available here

The package is available from CRAN by usual installing procedure. However due to restrictions in CRAN politics regarding number of submissions (one in 3-4 month) only major releases will be published there (with 2-3 weeks delay after GitHub release as more thorought testing is needed). To get the latest release plase use GitHub sources. You can download a zip-file with source package and install it using the `install.packages`

command, e.g. if the downloaded file is `mdatools_0.9.1.tar.gz`

and it is located in a current working directory, just run the following:

```
install.packages('mdatools_0.9.1.tar.gz')
```

If you have `devtools`

package installed, the following command will install the latest release from
the master branch of GitHub repository (do not forget to load the `devtools`

package first):

```
install_github('svkucheryavski/mdatools')
```

- all plot functions have new
`opacity`

parameter for semi-transparent colors - several improvements to PLS-DA method for one-class discrimination
- fixed a bug with wrong estimation of maximum number of components for PCA/SIMCA with cross-validation
- added chapter on PLS-DA to the tutorial (including last improvements)

- added randomized PCA algorithm (efficient for datasets with large number of rows)
- added option to inherit and show critical limits on residuals plot for PCA/SIMCA results
- added support for data driven approach to PCA/SIMCA (DD-SIMCA)
- added calculation of class belongings probability for SIMCA results
- added
`plotExtreme()`

method for SIMCA models - added
`setResLimits()`

method for PCA/SIMCA models - added
`plotProbabilities()`

method for SIMCA results - added
`getConfusionMatrix()`

method for classification results - added option to show prediction statistics using
`plotPrediction()`

for PLS results - added option to use equal axes limits in
`plotPrediction()`

for PLS results - the tutorial has been amended and extended correspondingly

- small improvements to calculation of statistics for regression coefficients
`pls.getRegCoeffs()`

now also returns standard error and confidence intervals calculated for unstandardized variables- new method
`summary()`

for object with regression coefficients (`regcoeffs`

) - fixed a bug with double labels on regression coefficients plot with confidence intervals
- fixed a bug in some PLS plots where labels for cross-validated results forced to be numbers
- when using
`mdaplot`

for data frame with one or more factor columns, the factors are now transofrmed to dummy variables (before it led to an error)

- fixed a bug in
`mdaplots`

when using factor with more than 8 levels for color grouping led to an error - fixed a bug in
`pca`

with wrong calculation of eigenvalues in NIPALS algorithm - bars on a bar plot now can be color grouped

- parameters
`lab.cex`

and`lab.col`

now are also applied to colorbar labels

- fixed a bug in PCA when explained variance was calculated incorrectly for data with excluded rows
- fixed several issues with SIMCA (cross-validation) and SIMCAM (Cooman's plot)
- added a chapter about SIMCA to the tutorial

- tutorial has been moved from GitBook to Bookdown and fully rewritten
- GitHub repo for the package has the tutorial as a static html site in
`docs`

folder - the
`mdaplot()`

and`mdaplotg()`

were rewritten completely and now are more easy to use (check tutorial) - new color scheme 'jet' with jet colors
- new plot type (
`'d'`

) for density scatter plot - support for
`xlas`

and`ylas`

in plots to rotate axis ticks - support for several data attributes to give extra functionality for plots (including manual x-values for line plots)
- rows and columns can be now hidden/excluded via attributes
- factor columns of data frames are now converted to dummy variables automatically when model is created/applied
- scores and loadings plots show % of explained variance in axis labels
- biplot is now available for PCA models (
`plotBiplot()`

) - scores plot for PCA model can be now also shown with color grouping (
`cgroup`

) if no there is no test set - cross-validation in PCA and PLS has been improved to make it faster
- added a posibility to exclude selected rows and columns from calculations
- added support for images (check tutorial)

- corrected a typo in title of selectivity ratio plot
`prep.autoscale()`

now do not scale columns with coefficient of variation below given threshold

- fixed an issue lead to plot.new() error in some cases
- documentation was regenerated with new version of Roxygen
- file People.RData was renamed to people.RData
- NIPALS method for PCA has been added
- code optimization to speed calculations up

- interval PLS variable selection (iPLS) is implemented
- normalization was added to preprocessing methods (
`prep.norm`

) - method
`getRegcoeffs`

was added to PLS model - automatic selection of optimal components in PLS (Wold's criterion and first local min)
- parameter
`cgroup`

for plots now can work with factors correctly (including ones with text levels) - all documentation was converted to roxygen2 format
- NAMESPACE file is generated by roxygen2
- fixed several small bugs and typos

- Q2 residuals renamed to Q (Squared residual distance)
- All plots have parameters
`lab.col`

and`lab.cex`

for changing color and font size for data point labels

- fixed a bug led to incorrect calculation of specificity
- jack-knife confidence intervals now also calculated for PLS-DA models
- confidence intervals for regression coefficients are shown by default if calculated

- randomization test for PLS has been added, see
`?randtest`

- systematic and repeated random cross-validation are available, see
`?crossval`

- fixed bug with labels on bar plot with confidence intervals
- fixed bug in PLS when using maximum number of components lead to NA values in weights

- fixed several small bugs
- improvemed documentation for basic methods

- fixed bug for computing classification performance for numeric class names
- improvements to SIMCA implementation

- added more details to documentation
- bug fixes for variable selection methods

- all documentation has been rewritten using
`roxygen2`

package - added extra preprocessing methods
- added VIP scores calculation and plot for PLS and PLS-DA models
- added Selectivity ratio calculation and plot for PLS and PLS-DA models
- added calculation of confidence intervals for PLS regression coefficient using jack-knife
- bug fixes and small improvements
- the first release available in CRAN

- New
`classres`

class for representation and visualisation of classification results - in PCA model, limits for T2 and Q2 now are calculated for all available components
- in PCA results, limits for T2 and Q2 calculated for a model are kept and shown on residuals plot
- added parameters
`xticklabels`

and`yticklabels`

to`mdaplot`

and`mdaplotg`

functions - New
`simca`

and`simcares`

classes for one-class SIMCA model and results - New
`simcam`

and`simcamres`

classes for multiclass SIMCA model and results - New
`plsda`

and`plsdares`

classes for PLS-DA model and results - bug fixes and improvements

- Enhancements in group bar plot
- Fixed bugs with wrong labels of bar plot with negative values

- Corrected errors and typos in README.md and small bg fixes

- PLS and all related methods were rewritten from the scratch to make them faster, more efficient
and also to follow the same code conventions as previously rewritten PCA. Here are main changes
you need to do in your code if you used mdatools PLS before:
`selectNumComp(model, ncomp)`

instead of`pls.selectncomp(model, ncomp)`

,`test.x`

ad`test.y`

instead of`Xt`

and`yt`

, finally separate logical arguments`center`

and`scale`

are used instead of previously used`autoscale`

. By default`scale = F`

and`center = T`

. - PLS and all related methods are now well documented (see
`?pls`

) - plotting tools for all classes and methods were rewritten completely. Now all plotting methods
use either
`mdaplot`

or`mdaplotg`

functions, which extend basic functionality of R plots. For example, they allow to make color groups and colorbar legend, calculate limits automatically depending on elements on a plot, make automatic legend and many other things.