Last updated on 2020-11-22 by Gavin Simpson
This Task View contains information about using R to analyse ecological and environmental data.
The base version of R ships with a wide range of functions for use within the field of environmetrics. This functionality is complemented by a plethora of packages available via CRAN, which provide specialist methods such as ordination & cluster analysis techniques. A brief overview of the available packages is provided in this Task View, grouped by topic or type of analysis. As a testament to the popularity of R for the analysis of environmental and ecological data, a special volume of the Journal of Statistical Software was produced in 2007.
Those useRs interested in environmetrics should consult the Spatial view. Complementary information is also available in the Multivariate, Phylogenetics, Cluster, and SpatioTemporal task views.
If you have any comments or suggestions for additions or improvements, then please contact the maintainer.
A list of available packages and functions is presented below, grouped by analysis type.
These packages are general, having wide applicability to the environmetrics field.
Analysing species response curves or modeling other data often involves the fitting of standard statistical models to ecological data and includes simple (multiple) regression, Generalised Linear Models (GLM), extended regression (e.g. Generalised Least Squares [GLS]), Generalised Additive Models (GAM), and mixed effects models, amongst others.
lm()
and glm()
for fitting linear and generalised
linear models, respectively.gam()
that
includes LOESS smooths.polr()
in the
MASS package, of Bill Venables and Brian Ripley.Tree-based models are being increasingly used in ecology, particularly for their ability to fit flexible models to complex data sets and the simple, intuitive output of the tree structure. Ensemble methods such as bagging, boosting and random forests are advocated for improving predictions from tree-based models and to provide information on uncertainty in regression models or classifiers.
Tree-structured models for regression, classification and survival analysis, following the ideas in the CART book, are implemented in
Multivariate trees are available in
Ensemble techniques for trees:
Graphical tools for the visualization of trees are available in package maptree.
Packages mda and earth implement Multivariate Adaptive Regression Splines (MARS), a technique which provides a more flexible, tree-based approach to regression than the piecewise constant functions used in regression trees.
R and add-on packages provide a wide range of ordination methods, many of which are specialised techniques
particularly suited to the analysis of species data. The two main packages are ade4 and
vegan. ade4 derives from the traditions of the French school of
Analyse des Donnees
and is based on the use of the duality diagram. vegan follows
the approach of Mark Hill, Cajo ter Braak and others, though the implementation owes more to that presented in
Legendre & Legendre (1988) Numerical Ecology, 2nd English Edition, Elsevier. Where the
two packages provide duplicate functionality, the user should choose whichever framework that best suits their
background.
prcomp()
function. rda()
(in package
vegan), pca()
(in package labdsv) and dudi.pca()
(in
package ade4), provide more ecologically-orientated implementations.rda()
in vegan and pcaiv()
in
ade4.cca()
in both vegan and
ade4.decorana()
in vegan.dudi.pco()
in ade4, pco()
in labdsv, pco()
in ecodist, and cmdscale()
in package MASS.isoMDS()
in package MASS
and nmds()
in ecodist. nmds()
, a wrapper function for isoMDS()
,
is also provided by package labdsv. vegan provides helper function metaMDS()
for
isoMDS()
, implementing random starts of the algorithm and standardised scaling of the NMDS results.
The approach adopted by vegan with metaMDS()
is the recommended approach for ecological
data.coinertia()
and mcoa()
, both in ade4.cancor()
in standard package stats.procrustes()
in vegan and procuste()
in
ade4, with both vegan and ade4 providing functions to test the significance of
the association between ordination configurations (as assessed by Procrustes rotation) using permutation/randomisation
and Monte Carlo methods.capscale()
in vegan,
fits constrained ordination models similar to RDA and CCA but with any any dissimilarity coefficient.Much ecological analysis proceeds from a matrix of dissimilarities between samples. A large amount of effort has been expended formulating a wide range of dissimilarity coefficients suitable for ecological data. A selection of the more useful coefficients are available in R and various contributed packages.
Standard functions that produce, square, symmetric matrices of pair-wise dissimilarities include:
dist()
in standard package statsdaisy()
in recommended package clustervegdist()
in vegandsvdis()
in labdsvDist()
in amapdistance()
in ecodistFunction distance()
in package analogue can be used to calculate dissimilarity between samples
of one matrix and those of a second matrix. The same function can be used to produce pair-wise dissimilarity matrices,
though the other functions listed above are faster. distance()
can also be used to generate
matrices based on Gower's coefficient for mixed data (mixtures of binary, ordinal/nominal and continuous variables).
Function daisy()
in package cluster provides a faster implementation of Gower's coefficient for
mixed-mode data than distance()
if a standard dissimilarity matrix is required. Function gowdis()
in package FD also computes Gower's coefficient and implements extensions to ordinal variables.
Cluster analysis aims to identify groups of samples within multivariate data sets. A large range of approaches to this problem have been suggested, but the main techniques are hierarchical cluster analysis, partitioning methods, such as k-means, and finite mixture models or model-based clustering. In the machine learning literature, cluster analysis is an unsupervised learning problem.
The Cluster task view provides a more detailed discussion of available cluster analysis methods and appropriate R functions and packages.
Hierarchical cluster analysis:
hclust()
in standard package statshcluster()
in amapPartitioning methods:
kmeans()
in stats provides k-means clusteringcmeans()
in e1071 implements a fuzzy version of the k-means algorithmMixture models and model-based cluster analysis:
There is a growing number of packages and books that focus on the use of R for theoretical ecological models.
so-calledHill's numbers [e.g. Hill's N2] and rarefaction), ranked abundance diagrams, Fisher's log series, Broken Stick model, Hubbell's abundance model, amongst others.
betadiver()
in vegan implements all of the diversity indices reviewed in
Koleff et al (2003; Journal of
Animal Ecology 72(3), 367-382).
betadiver()
also provides a plot
method to produce the co-occurrence frequency triangle plots
of the type found in Koleff et al (2003).betadisper()
, also in vegan, implements Marti Anderson's distance-based test for
homogeneity of multivariate dispersions (PERMDISP, PERMDISP2), a multivariate analogue of Levene's test (Anderson
2006; Biometrics 62,
245-253). Anderson et al (2006;
Ecology Letters 9(6), 683-693)
demonstrate the use of this approach for measuring beta diversity.This section concerns estimation of population parameters (population size, density, survival probability, site occupancy etc.) by methods that allow for incomplete detection. Many of these methods use data on marked animals, variously called 'capture-recapture', 'mark-recapture' or 'capture-mark-recapture' data.
Packages secr and DSpat can also be used to simulate data from their respective models.
See also the SpatioTemporal task view for analysis of animal tracking data under Moving objects, trajectories.
ts()
function, though see tseries or
zoo below for alternatives.ar()
, and arima()
functions in
standard package stats for autoregressive (AR), moving average (MA), autoregressive moving average (ARMA) and
integrated ARMA (ARIMA) models.irts()
in package tseries.lm()
, glm()
,
loess()
, rlm()
and lqs()
from MASS,
randomForest()
(package randomForest), rq()
(package quantreg) amongst
others, whilst preserving the time series information.Additionally, a fuller description of available packages for time series analysis can be found in the TimeSeries task view.
See the Spatial CRAN Task View for an overview of spatial analysis in R.
ismev provides functions for models for extreme value statistics and is support software for Coles (2001) An Introduction to Statistical Modelling of Extreme Values, Springer, New York. Other packages for extreme value theory include:
Packages specifically tailored for the analysis of phylogenetic and evolutionary data include:
The Phylogenetics task view provides more detailed coverage of the subject area and related functions within R.
UseRs may also be interested in Paradis (2006) Analysis of Phylogenetics and Evolution with R, Springer, New York, a book in the new UseR series from Springer.
Several packages are now available that implement R functions for widely-used methods and approaches in pedology.
A growing number of packages are available that implement methods specifically related to the fields of hydrology and oceanography. Also see the Extreme Value and the Climatology sections for related packages.
Several packages related to the field of climatology.
Several packages now provide specialist functionality for the import, analysis, and plotting of palaeoecological data.
Stratiplot()
function in analogue and functions strat.plot()
and strat.plot.simple
in the rioja package. Also see the ggplot()
. A blog post by the maintainer of the prcurve()
function.Several other relevant contributed packages for R are available that do not fit under nice headings.
3 months ago by Aurélie Siberchicot
Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
6 months ago by Gavin L. Simpson
Analogue and Weighted Averaging Methods for Palaeoecology
a month ago by Roeland Kindt
Package for Community Ecology and Suitability Analysis
8 years ago by Emanuele Cordano
Analytic Solutions for (ground-water) Boussinesq Equation
3 years ago by Claudio Agostinelli
Circular Statistics, from "Topics in Circular Statistics" (2001)
2 years ago by Martin Maechler
"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.
2 months ago by Laura Marshall
Distance Sampling Detection Function and Abundance Estimation
3 years ago by Peter Metcalfe
Implementation of the Dynamic TOPMODEL Hydrological Model
2 years ago by Daniel Fuka
A Community Modeling Foundation for Eco-Hydrology
3 months ago by Alexander Kowarik
Package for Environmental Statistics, Including US EPA Guidance
3 months ago by David Meyer
Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
5 years ago by Andrew Robinson
Provides Tests and Graphics for Assessing Tests of Equivalence
6 years ago by Etienne Laliberté
Measuring functional diversity (FD) from multiple traits, and other tools for functional ecology
10 months ago by Mauricio Zambrano-Bigiarini
Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series
10 months ago by Mauricio Zambrano-Bigiarini
Time Series Management, Analysis and Interpolation for Hydrological Modelling
7 years ago by Emanuele Eccel
Hourly interpolation of multiple temperature daily series
3 years ago by Eric Gilleland
An Introduction to Statistical Modeling of Extreme Values
a year ago by Jeff Laake
Mark-Recapture Analysis for Survival and Abundance Estimation
5 months ago by Brian Ripley
Support Functions and Datasets for Venables and Ripley's MASS
2 months ago by Luca Scrucca
Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation
5 months ago by Simon Wood
Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
2 years ago by Aaron A. King
Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses
3 years ago by Philippe Grosjean
Package for Analysis of Space-Time Ecological Series
3 years ago by Patrick Giraudoux
Spatial Analysis and Data Mining for Field Ecologists
9 days ago by Thorsten Pohlert
Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended
a year ago by Christian Hennig
Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data
a year ago by Ryota Suzuki
Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling
3 years ago by Andy Liaw
Breiman and Cutler's Random Forests for Classification and Regression
a month ago by Vito M. R. Muggeo
Regression Models with Break-Points / Change-Points Estimation
8 years ago by Gerald Jurasinski
A Collection of functions for similarity analysis of vegetation data
2 years ago by Thomas Petzoldt
Simulation of Ecological (and Other) Dynamic Systems
2 years ago by Julien Moeys
Functions for Soil Texture Plot, Classification and Transformation
4 years ago by Stephen A Sefick Jr.
Calculate Single Station Metabolism from Diurnal Oxygen Curves
10 months ago by Sebastian Meyer
Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena
3 years ago by Wouter Buytaert
Implementation of the Hydrological Model TOPMODEL in R
9 months ago by Achim Zeileis
S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)