Last updated on 2021-03-01 by Paul Hewson

Base R contains most of the functionality for classical multivariate analysis, somewhere. There are a large number of packages on CRAN which extend this methodology, a brief overview is given below. Application-specific uses of multivariate statistics are described in relevant task views, for example whilst principal components are listed here, ordination is covered in the Environmetrics task view. Further information on supervised classification can be found in the MachineLearning task view, and unsupervised classification in the Cluster task view.

The packages in this view can be roughly structured into the following topics. If you think that some package is missing from the list, please let me know.

**Visualising multivariate data**

*Graphical Procedures:*A range of base graphics (e.g.`pairs()`

and`coplot()`

) and lattice functions (e.g.`xyplot()`

and`splom()`

) are useful for visualising pairwise arrays of 2-dimensional scatterplots, clouds and 3-dimensional densities.`scatterplot.matrix`

in the car provides usefully enhanced pairwise scatterplots. Beyond this, scatterplot3d provides 3 dimensional scatterplots, aplpack provides bagplots and`spin3R()`

, a function for rotating 3d clouds. misc3d, dependent upon rgl, provides animated functions within R useful for visualising densities. YaleToolkit provides a range of useful visualisation techniques for multivariate data. More specialised multivariate plots include the following:`faces()`

in aplpack provides Chernoff's faces;`parcoord()`

from MASS provides parallel coordinate plots;`stars()`

in graphics provides a choice of star, radar and cobweb plots respectively.`mstree()`

in ade4 and`spantree()`

in vegan provide minimum spanning tree functionality. calibrate supports biplot and scatterplot axis labelling. geometry, which provides an interface to the qhull library, gives indices to the relevant points via`convexhulln()`

. ellipse draws ellipses for two parameters, and provides`plotcorr()`

, visual display of a correlation matrix. denpro provides level set trees for multivariate visualisation. Mosaic plots are available via`mosaicplot()`

in graphics and`mosaic()`

in vcd that also contains other visualization techniques for multivariate categorical data. gclus provides a number of cluster specific graphical enhancements for scatterplots and parallel coordinate plots See the links for a reference to GGobi. xgobi interfaces to the XGobi and XGvis programs which allow linked, dynamic multivariate plots as well as projection pursuit. Finally, iplots allows particularly powerful dynamic interactive graphics, of which interactive parallel co-ordinate plots and mosaic plots may be of great interest. Seriation methods are provided by seriation which can reorder matrices and dendrograms.*Data Preprocessing:*`summarize()`

and`summary.formula()`

in Hmisc assist with descriptive functions; from the same package`varclus()`

offers variable clustering while`dataRep()`

and`find.matches()`

assist in exploring a given dataset in terms of representativeness and finding matches. Whilst`dist()`

in base and`daisy()`

in cluster provide a wide range of distance measures, proxy provides a framework for more distance measures, including measures between matrices. simba provides functions for dealing with presence / absence data including similarity matrices and reshaping.

**Hypothesis testing**

- ICSNP provides Hotellings T2 test as well as a range of non-parametric tests including location tests based on marginal ranks, spatial median and spatial signs computation, estimates of shape. Non-parametric two sample tests are also available from cramer and spatial sign and rank tests to investigate location, sphericity and independence are available in SpatialNP.

**Multivariate distributions**

*Descriptive measures:*`cov()`

and`cor()`

in stats will provide estimates of the covariance and correlation matrices respectively. ICSNP offers several descriptive measures such as`spatial.median()`

which provides an estimate of the spatial median and further functions which provide estimates of scatter. Further robust methods are provided such as`cov.rob()`

in MASS which provides robust estimates of the variance-covariance matrix by minimum volume ellipsoid, minimum covariance determinant or classical product-moment. covRobust provides robust covariance estimation via nearest neighbor variance estimation. robustbase provides robust covariance estimation via fast minimum covariance determinant with`covMCD()`

and the Orthogonalized pairwise estimate of Gnanadesikan-Kettenring via`covOGK()`

. Scalable robust methods are provided within rrcov also using fast minimum covariance determinant with`covMcd()`

as well as M-estimators with`covMest()`

. corpcor provides shrinkage estimation of large scale covariance and (partial) correlation matrices.*Densities (estimation and simulation):*`mvnorm()`

in MASS simulates from the multivariate normal distribution. mvtnorm also provides simulation as well as probability and quantile functions for both the multivariate t distribution and multivariate normal distributions as well as density functions for the multivariate normal distribution. mnormt provides multivariate normal and multivariate t density and distribution functions as well as random number simulation. sn provides density, distribution and random number generation for the multivariate skew normal and skew t distribution. delt provides a range of functions for estimating multivariate densities by CART and greedy methods. Comprehensive information on mixtures is given in the Cluster view, some density estimates and random numbers are provided by`rmvnorm.mixt()`

and`dmvnorm.mixt()`

in ks, mixture fitting is also provided within bayesm. Functions to simulate from the Wishart distribution are provided in a number of places, such as`rwishart()`

in bayesm and`rwish()`

in MCMCpack (the latter also has a density function`dwish()`

).`bkde2D()`

from KernSmooth and`kde2d()`

from MASS provide binned and non-binned 2-dimensional kernel density estimation, ks also provides multivariate kernel smoothing as does ash and GenKern. prim provides patient rule induction methods to attempt to find regions of high density in high dimensional multivariate data, feature also provides methods for determining feature significance in multivariate data (such as in relation to local modes).*Assessing normality:*mvnormtest provides a multivariate extension to the Shapiro-Wilks test, mvoutlier provides multivariate outlier detection based on robust methods. ICS provides tests for multi-normality.`mvnorm.etest()`

in energy provides an assessment of normality based on E statistics (energy); in the same package`k.sample()`

assesses a number of samples for equal distributions. Tests for Wishart-distributed covariance matrices are given by`mauchly.test()`

in stats.*Copulas:*copula provides routines for a range of (elliptical and archimedean) copulas including normal, t, Clayton, Frank, Gumbel, fgac provides generalised archimedian copula.

**Linear models**

- From stats,
`lm()`

(with a matrix specified as the dependent variable) offers multivariate linear models,`anova.mlm()`

provides comparison of multivariate linear models.`manova()`

offers MANOVA. sn provides`msn.mle()`

and`mst.mle()`

which fit multivariate skew normal and multivariate skew t models.pls provides partial least squares regression (PLSR) and principal component regression, dr provides dimension reduction regression options such as`"sir"`

(sliced inverse regression),`"save"`

(sliced average variance estimation). plsgenomics provides partial least squares analyses for genomics. relaimpo provides functions to investigate the relative importance of regression parameters.

**Projection methods**

*Principal components:*these can be fitted with`prcomp()`

(based on`svd()`

, preferred) as well as`princomp()`

(based on`eigen()`

for compatibility with S-PLUS) from stats.`pc1()`

in Hmisc provides the first principal component and gives coefficients for unscaled data. Additional support for an assessment of the scree plot can be found in nFactors, whereas paran provides routines for Horn's evaluation of the number of dimensions to retain. For wide matrices, gmodels provides`fast.prcomp()`

and`fast.svd()`

. kernlab uses kernel methods to provide a form of non-linear principal components with`kpca()`

. pcaPP provides robust principal components by means of projection pursuit. amap provides further robust and parallelised methods such as a form of generalised and robust principal component analysis via`acpgen()`

and`acprob()`

respectively. Further options for principal components in an ecological setting are available within ade4 and in a sensory setting in SensoMineR. psy provides a variety of routines useful in psychometry, in this context these include`sphpca()`

which maps onto a sphere and`fpca()`

where some variables may be considered as dependent as well as`scree.plot()`

which has the option of adding simulation results to help assess the observed data. PTAk provides principal tensor analysis analagous to both PCA and correspondence analysis. smatr provides standardised major axis estimation with specific application to allometry.*Canonical Correlation:*`cancor()`

in stats provides canonical correlation. kernlab uses kernel methods to provide robust canonical correlation with`kcca()`

.*Redundancy Analysis:*calibrate provides`rda()`

for redundancy analysis as well as further options for canonical correlation. fso provides fuzzy set ordination, which extends ordination beyond methods available from linear algebra.*Independent Components:*fastICA provides fastICA algorithms to perform independent component analysis (ICA) and Projection Pursuit, and PearsonICA uses score functions. ICS provides either an invariant co-ordinate system or independent components. JADE adds an interface to the JADE algorithm, as well as providing some diagnostics for ICA.*Procrustes analysis:*`procrustes()`

in vegan provides procrustes analysis, this package also provides functions for ordination and further information on that area is given in the Environmetrics task view. Generalised procrustes analysis via`GPA()`

is available from FactoMineR.

**Principal coordinates / scaling methods**

`cmdscale()`

in stats provides classical multidimensional scaling (principal coordinates analysis),`sammon()`

and`isoMDS()`

in MASS offer Sammon and Kruskal's non-metric multidimensional scaling. vegan provides wrappers and post-processing for non-metric MDS.`indscal()`

is provided by SensoMineR.

**Unsupervised classification**

*Cluster analysis:*A comprehensive overview of clustering methods available within R is provided by the Cluster task view. Standard techniques include hierarchical clustering by`hclust()`

and k-means clustering by`kmeans()`

in stats. A range of established clustering and visualisation techniques are also available in cluster, some cluster validation routines are available in clv and the Rand index can be computed from`classAgreement()`

in e1071. Cluster ensembles are available from clue, methods to assist with choice of routines are available in clusterSim. Distance measures (`edist()`

) and hierarchical clustering (`hclust.energy()`

) based on E-statistics are available in energy. Mahalanobis distance based clustering (for fixed points as well as clusterwise regression) are available from fpc. clustvarsel provides variable selection within model-based clustering. Fuzzy clustering is available within cluster as well as via the hopach (Hierarchical Ordered Partitioning and Collapsing Hybrid) algorithm. kohonen provides supervised and unsupervised SOMs for high dimensional spectra or patterns. clusterGeneration helps simulate clusters. The Environmetrics task view also gives a topic-related overview of some clustering techniques. Model based clustering is available in mclust.*Tree methods:*Full details on tree methods are given in the MachineLearning task view. Suffice to say here that classification trees are sometimes considered within multivariate methods; rpart is most used for this purpose. party provides recursive partitioning. Classification and regression training is provided by caret. kknn provides k-nearest neighbour methods which can be used for regression as well as classification.

**Supervised classification and discriminant analysis**

`lda()`

and`qda()`

within MASS provide linear and quadratic discrimination respectively. mda provides mixture and flexible discriminant analysis with`mda()`

and`fda()`

as well as multivariate adaptive regression splines with`mars()`

and adaptive spline backfitting with the`bruto()`

function. Multivariate adaptive regression splines can also be found in earth. Package class provides k-nearest neighbours by`knn()`

. SensoMineR provides`FDA()`

for factorial discriminant analysis. A number of packages provide for dimension reduction with the classification. klaR includes variable selection and robustness against multicollinearity as well as a number of visualisation routines. gpls provides classification using generalised partial least squares. hddplot provides cross-validated linear discriminant calculations to determine the optimum number of features. ROCR provides a range of methods for assessing classifier performance. Further information on supervised classification can be found in the MachineLearning task view.

**Correspondence analysis**

`corresp()`

and`mca()`

in MASS provide simple and multiple correspondence analysis respectively. ca also provides single, multiple and joint correspondence analysis.`ca()`

and`mca()`

in ade4 provide correspondence and multiple correspondence analysis respectively, as well as adding homogeneous table analysis with`hta()`

. Further functionality is also available within vegan co-correspondence is available from cocorresp. FactoMineR provides`CA()`

and`MCA()`

which also enable simple and multiple correspondence analysis as well as associated graphical routines. homals provides homogeneity analysis.

**Missing data**

- mitools provides tools for multiple imputation, mice provides
multivariate imputation by chained equations, mix
provides multiple imputation for mixed categorical and continuous data.
pan provides multiple imputation for
missing panel data. VIM provides methods for the visualisation as well as imputation of missing data.
`aregImpute()`

and`transcan()`

from Hmisc provide further imputation methods.

**Latent variable approaches**

`factanal()`

in stats provides factor analysis by maximum likelihood, Bayesian factor analysis is provided for Gaussian, ordinal and mixed variables in MCMCpack. GPArotation offers GPA (gradient projection algorithm) factor rotation. sem fits linear structural equation models and ltm provides latent trait models under item response theory and range of extensions to Rasch models can be found in eRm. FactoMineR provides a wide range of Factor Analysis methods, including`MFA()`

and`HMFA()`

for multiple and hierarchical multiple factor analysis as well as`ADFM()`

for multiple factor analysis of quantitative and qualitative data. poLCA provides latent class and latent class regression models for a variety of outcome variables.

**Modelling non-Gaussian data**

- MNP provides Bayesian multinomial probit models, polycor provides polychoric and tetrachoric correlation matrices. bayesm provides a range of models such as seemingly unrelated regression, multinomial logit/probit, multivariate probit and instrumental variables. VGAM provides Vector Generalised Linear and Additive Models, Reduced Rank regression

**Matrix manipulations**

- As a vector- and matrix-based language, base R ships with many powerful tools for doing matrix manipulations, which are complemented by the packages Matrix and SparseM. matrixcalc adds functions for matrix differential calculus. Some further sparse matrix functionality is also available from spam.

**Miscellaneous utilities**

- abind generalises
`cbind()`

and`rbind()`

for arrays,`mApply()`

in Hmisc generalises`apply()`

for matrices and passes multiple functions. In addition to functions listed earlier, sn provides operations such as marginalisation, affine transformations and graphics for the multivariate skew normal and skew t distribution. mAr provides for vector auto-regression.`rm.boot()`

from Hmisc bootstraps repeated measures models. psy also provides a range of statistics based on Cohen's kappa including weighted measures and agreement among more than 2 raters. cwhmisc contains a number of interesting support functions which are of interest, such as`ellipse()`

,`normalise()`

and various rotation functions. desirability provides functions for multivariate optimisation. geozoo provides plotting of geometric objects in GGobi.

- Task view: Cluster
- Task view: Environmetrics
- Task view: MachineLearning
- Bioconductor package: gpls
- Bioconductor package: hopach
- GGobi (interactive dynamic visualisation software, available standalone or as an R library)
- Hmisc functions related to multivariate analysis
- Psychometrics in R, Jan de Leeuw
- qhull library

6 months ago by Aurélie Siberchicot

Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

2 years ago by Hans Peter Wolf

Another Plot Package: 'Bagplots', 'Iconplots', 'Summaryplots', Slider Functions and Others

22 days ago by Martin Maechler

"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.

5 months ago by Weiliang Qiu

Random Cluster Generation (with Specified Degree of Separation)

4 months ago by Andrzej Dudek

Searching for Optimal Clustering Procedure for a Data Set

5 months ago by Luca Scrucca

Variable Selection for Gaussian Model-Based Clustering

4 years ago by Korbinian Strimmer

Efficient Estimation of Covariance and (Partial) Correlation

4 years ago by Hana Sevcikova

Robust Covariance Estimation via Nearest Neighbor Cleaning

2 years ago by Carsten Franz

Multivariate Nonparametric Cramer-Test for the Two-Sample-Problem

3 years ago by Christian W. Hoffmann

Miscellaneous Functions for Math, Plotting, Printing, Statistics, Strings, and Tools

6 years ago by Jussi Klemela

Estimation of Multivariate Densities Using Adaptive Partitions

5 years ago by Max Kuhn

Function Optimization and Ranking via Desirability Functions

2 months ago by David Meyer

Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

a year ago by Duncan Murdoch

Functions for Drawing Ellipses and Ellipse-Like Confidence Regions

3 months ago by Maria Rizzo

E-Statistics: Multivariate Inference via the Energy of Data

5 months ago by Francois Husson

Multivariate Exploratory Data Analysis and Data Mining

2 years ago by Brian Ripley

FastICA Algorithms to Perform ICA and Projection Pursuit

3 months ago by Tarn Duong

Local Inferential Feature Significance for Multivariate Kernel Density Estimation

7 years ago by David Lucy

Functions for generating and manipulating binned kernel density estimates

3 years ago by John Maindonald

Use Known Groups in High-Dimensional Data to Derive Scores for Plots

a year ago by Klaus Nordhausen

Blind Source Separation Methods Based on Joint Diagonalization and Some BSS Performance Criteria

6 days ago by Brian Ripley

Functions for Kernel Smoothing Supporting Wand & Jones (1995)

6 days ago by Brian Ripley

Support Functions and Datasets for Venables and Ripley's MASS

9 years ago by Frederick Novomestky

Collection of functions for matrix calculations

6 months ago by Luca Scrucca

Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation

4 years ago by Brian Ripley

Estimation/Multiple Imputation for Mixed Categorical and Continuous Data

8 months ago by Adelchi Azzalini

The Multivariate Normal and t Distributions, and Their Truncated Versions

3 years ago by P. Filzmoser

Multivariate Outlier Detection Based on Robust Methods

a year ago by Gilles Raiche

Parallel Analysis and Other Non Graphical Solutions to the Cattell Scree Test

12 years ago by Juha Karvanen

Independent component analysis using score functions from the Pearson system

9 months ago by Bjørn-Helge Mevik

Partial Least Squares and Principal Component Regression

7 months ago by Michael Hahsler

Infrastructure for Ordering Objects Using Seriation

8 years ago by Gerald Jurasinski

A Collection of functions for similarity analysis of vegetation data

3 years ago by Remko Duursma

(Standardised) Major Axis Estimation and Testing Routines

a month ago by Adelchi Azzalini

The Skew-Normal and Related Distributions Such as the Skew-t and the SUN

2 years ago by Jari Miettinen

Multivariate Nonparametric Methods Based on Spatial Signs and Ranks

9 years ago by Martin Maechler

Interface to the XGobi and XGvis programs for graphical data analysis