Task view: Phylogenetics, Especially Comparative Methods

Last updated on 2021-08-18 by Brian O'Meara

The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on GitHub, R-Forge, or authors' websites.

Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape), though the S4 phylo4 class (implemented in phylobase) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo));"). phylobase and its lighter weight sibling rncl can use the Nexus Class Library to read NEXUS, Newick, and other tree formats. treebase can search for and load trees from the online tree repository TreeBASE, rdryad can pull data from the online data repository Dryad. RNeXML can read, write, and process metadata for the NeXML format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website). phyext2 can read and write various tree formats, including simmap formats. rotl can pull in a synthetic tree and individual study trees from the Open Tree of Life project. The treeio package can read trees in Newick, Nexus, New Hampshire eXtended format (NHX), jplace and Phylip formats and data output from BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes. phylogram can convert Newick files into dendrogram objects. brranching can fetch phylogenies from online repositories, including phylomatic.

Utility functions: These packages include functions for manipulating trees or associated data. ape has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, pulling in data from GenBank, and many more. phylobase has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants). geiger can prune trees and data to an overlapping set of taxa. tidytree can convert a tree object in to a tidy data frame and has other tidy approaches to manipulate tree data. evobiR can do fuzzy matching of names (to allow some differences). SigTree finds branches that are responsive to some treatment, while allowing correction for multiple comparisons. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. apex can handle multiple gene DNA alignments making their use and analysis for tree inference easier in ape and phangorn. aphid can weight sequences based on a phylogeny and can use hidden Markov models (HMMs) for a variety of purposes including multiple sequence alignment.

Ancestral state reconstruction : Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using ape. markophylo can fit a broad set of discrete character types with models that can incorporate constrained substitution rates, rate partitioning across sites, branch-specific rates, sampling bias, and non-stationary root probabilities. phytools can do stochastic character mapping of traits on trees.

Diversification Analysis: Lineage through time plots can be done in ape. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification). TESS can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in geiger. diversitree implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009). TreePar estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species. caper can do the macrocaic test to evaluate the effect of a a trait on diversity. apTreeshape also has tests for differential diversification (see description). iteRates can identify and visualize areas on a tree undergoing differential diversification. DDD can fit density dependent models as well as models with occasional escape from density-dependence. BAMMtools is an interface to the BAMM program to allow visualization of rate shifts, comparison of diversification models, and other functions. DDD implements maximum likelihood methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, as well as identifies key innovations and simulate a density-dependent process. PBD can calculate the likelihood of a tree under a protracted speciation model. phyloTop has functions for investigating tree shape, with special functions and datasets relating to trees of infectious diseases.

Divergence Times: Non-parametric rate smoothing (NPRS) and penalized likelihood can be implemented in ape. geiger can do congruification to stretch a source tree to match a specified standard tree. treedater implements various clock models, ways to assess confidence, and detecting outliers.

Phylogenetic Inference: UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package ape. phangorn can estimate trees using distance, parsimony, and likelihood. phyclust can cluster sequences. phytools can build trees using MRP supertree estimation and least squares. phylotools can build supermatrices for analyses in other software. pastis can use taxonomic information to make constraints for Bayesian tree searches. For more information on importing sequence data, see the Genetics task view; pegas may also be of use.

Time series/Paleontology: Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using paleoTS. strap can do stratigraphic analysis of phylogenetic trees.

Tree Simulations: Trees can be simulated using constant-rate birth-death with various constraints in TreeSim and a birth-death process in geiger. Random trees can be generated in ape by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees). paleotree can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa. TESS can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.

Trait evolution: Independent contrasts for continuous characters can be calculated using ape, picante, or caper (which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in geiger. Brownian motion models can be fit in geiger, ape, and paleotree. Deviations from Brownian motion can be investigated in geiger and OUwie. mvMORPH can fit Brownian motion, early burst, ACDC, OU, and shift models to univariate or multivariate data. Ornstein-Uhlenbeck (OU) models can be fitted in geiger, ape, ouch (with multiple means), and OUwie (with multiple means, rates, and attraction values). geiger fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in geiger. Multiple-rate Brownian motion can be fit in cran/RBrownie. Traditional GLS methods (sensu Grafen or Martins) can be implemented in ape, PHYLOGR, or caper. Phylogenetic autoregression (sensu Cheverud et al) and Phylogenetic autocorrelation (Moran's I) can be implemented in ape or--if you wish the significance test of Moran's I to be calculated via a randomization procedure--in adephylo. Correlation between traits using a GLMM can also be investigated using MCMCglmm. phylolm can fit phylogenetic linear regression and phylogenetic logistic regression models using a fast algorithm, making it suitable for large trees. brms can examine correlations between continuous and discrete traits, and can incorporate multiple measurements per species. phytools can also investigate rates of trait evolution and do stochastic character mapping. metafor can perform meta-analyses accounting for phylogenetic structure. pmc evaluates the model adequacy of several trait models (from geiger and ouch) using Monte Carlo approaches. phyreg implements the Grafen (1989) phyglogenetic regression. geomorph can do geometric morphometric analysis in a phylogenetic context. Disparity through time, and other disparity-related analyses, can be performed with dispRity. MPSEM can predict features of one species based on information from related species using phylogenetic eigenvector maps. Rphylip wraps PHYLIP which can do independent contrasts, the threshold model, and more. convevol and windex can both test for convergent evolution on a phylogeny.

Trait Simulations : Continuous traits can be simulated using brownian motion in ouch, geiger, ape, picante, OUwie, and caper, the Hansen model (a form of the OU) in ouch and OUwie and a speciational model in geiger. Discrete traits can be simulated using a continuous time Markov model in geiger. phangorn can simulate DNA or amino acids. Both discrete and continuous traits can be simulated under models where rates change through time in geiger. phytools can simulate discrete characters using stochastic character mapping. phylolm can simulate continuous or binary traits along a tree.

Tree Manipulation : Branch length scaling using ACDC; Pagel's (1999) lambda, delta and kappa parameters; and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in geiger. phytools also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using ape. Extinct taxa can be pruned using geiger. phylobase offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with phangorn. paleotree has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. enveomics.R can prune a tree to keep clade representatives.

Community/Microbial Ecology: picante, vegan, SYNCSA, phylotools, PCPS, caper, DAMOCLES integrate several tools for using phylogenetics with community ecology. HMPTrees and GUniFrac provide tools for comparing microbial communities. betapart allows computing pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity. adiv can calculate various indices of biodiversity including species, functional and phylogenetic diversity, as well as alpha, beta, and gamma diversities. entropart can measure and partition diversity based on Tsallis entropy as well as calculate alpha, beta, and gamma diversities. metacoder is an R package for handling large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding.

Phyloclimatic Modeling: phyloclim integrates several new tools in this area.

Phylogeography / Biogeography: phyloland implements a model of space colonization mapped on a phylogeny, it aims at estimating limited dispersal and competitive exclusion in a statistical phylogeographic framework. diversitree implements the GeoSSE method for diversification analyses based on two areas.

Species/Population Delimitation: adhoc can estimate an ad hoc distance threshold for a reference library of DNA barcodes.

Tree Plotting and Visualization: User trees can be plotted using ape, adephylo, phylobase, phytools, ouch, and dendextend; several of these have options for branch or taxon coloring based on some criterion (ancestral state, tree structure, etc.). paleoPhylo and paleotree are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using ape. Ancestral state reconstructions can be visualized along branches using ape and paleotree. phytools can project a tree into a morphospace. BAMMtools can visualize rate shifts calculated by BAMM on a tree. The popular R visualization package ggplot2 can be extended by GuangchuangYu/ggtree to visualize phylogenies. Trees can also be to interactively explored (as dendrograms) using idendr0. phylocanvas is a widget for "htmlwidgets" that enables embedding of phylogenetic trees using the phylocanvas javascript library. ggmuller allows plotting a phylogeny along with frequency dynamics.

Tree Comparison: Tree-tree distances can be evaluated, and used in additional analyses, in distory and Rphylip. ape can compute tree-tree distances and also create a plot showing two trees with links between associated tips. kdetrees implements a non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees, which could represent inference problems or processes such as horizontal gene transfer. dendextend can evaluate multiple measures comparing dendrograms.

Taxonomy: taxize can interact with a suite of web APIs for taxonomic tasks, such as verifying species names, getting taxonomic hierarchies, and verifying name spelling. evobiR contains functions for making a tree at higher taxonomic levels, downloading a taxonomy tree from NCBI or ITIS, and various other miscellaneous functions (simulations of character evolution, calculating D-statistics, etc.).

Gene tree - species tree: HyPhy can count the duplication and loss cost to reconcile a gene tree to a species tree. It can also sample histories of gene trees from within family trees.

Interactions with other programs: geiger can call PATHd8 through its congruify function. ips wraps several tree inference and other programs, including MrBayes, Beast, and RAxML, allowing their easy use from within R. Rphylip wraps PHYLIP, a broad variety of programs for tree inference under parsimony, likelihood, and distance, bootstrapping, character evolution, and more. BoSSA can use information from various tools to place a query sequence into a reference tree. pastis can use taxonomic information to make constraints for MrBayes tree searches.

Notes: At least ten packages start as phy* in this domain, including two pairs of similarly named packages (phytools and phylotools, phylobase and phybase). This can easily lead to confusion, and future package authors are encouraged to consider such overlaps when naming packages. For clarification, phytools provides a wide array of functions, especially for comparative methods, and is maintained by Liam Revell; phylotools has functions for building supermatrices and is maintained by Jinlong Zhang. phylobase implements S4 classes for phylogenetic trees and associated data and is maintained by Francois Michonneau; phybase has tree utility functions and many functions for gene tree - species tree questions and is authored by Liang Liu, but no longer appears on CRAN.


  • Borregaard, M.K., Rahbek, C., Fjeldsaa, J., Parra, J.L., Whittaker, R.J. and Graham, C.H. 2014. Node-based analysis of species distributions. Methods in Ecology and Evolution 5(11): 1225-1235.
  • Butler MA, King AA 2004 Phylogenetic comparative analysis: A modeling approach for adaptive evolution. American Naturalist 164, 683-695.
  • Cheverud JM, Dow MM, Leutenegger W 1985 The quantitative assessment of phylogenetic constraints in comparative analyses: Sexual dimorphism in body weight among primates. Evolution 39, 1335-1351.
  • FitzJohn RG, Maddison WP, and Otto SP 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Systematic Biology 58: 595-611.
  • Garland T, Harvey PH, Ives AR 1992 Procedures for the analysis of comparative data using phylogenetically independent contrasts. Systematic Biology 41, 18-32.
  • Hansen TF 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51: 1341-1351.
  • Maddison WP, Midford PE, and Otto SP 2007. Estimating a binary character's effect on speciation and extinction. Systematic Biology 56: 701–710.
  • Magallon S, Sanderson, M.J. 2001. Absolute Diversification Rates in Angiosperm Clades. Evolution 55(9):1762-1780.
  • Moore, BR, Chan, KMA, Donoghue, MJ (2004) Detecting diversification rate variation in supertrees. In Bininda-Emonds ORP (ed) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Kluwer Academic pgs 487-533.
  • Nee S, May RM, Harvey PH 1994. The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London Series B Biological Sciences 344: 305-311.
  • Pagel M 1999 Inferring the historical patterns of biological evolution. Nature 401, 877-884
  • Pybus OG, Harvey PH 2000. Testing macro-evolutionary models using incomplete molecular phylogenies. Proceedings of the Royal Society of London Series B Biological Sciences 267, 2267-2272.


adephylo — 1.1-11

Exploratory Analyses for the Phylogenetic Comparative Method

adhoc — 1.1

Calculate Ad Hoc Distance Thresholds for DNA Barcoding Identification

adiv — 2.1.1

Analysis of Diversity

ape — 5.5

Analyses of Phylogenetics and Evolution

apex — 1.0.4

Phylogenetic Methods for Multiple Gene Data

aphid — 1.3.3

Analysis with Profile Hidden Markov Models

apTreeshape — 1.5-0.1

Analyses of Phylogenetic Treeshape

BAMMtools — 2.1.8

Analysis and Visualization of Macroevolutionary Dynamics on Phylogenetic Trees

betapart — 1.5.4

Partitioning Beta Diversity into Turnover and Nestedness Components

BoSSA — 3.7

A Bunch of Structure and Sequence Analysis

brranching — 0.7.0

Fetch 'Phylogenies' from Many Sources

brms — 2.16.1

Bayesian Regression Models using 'Stan'

caper — 1.0.1

Comparative Analyses of Phylogenetics and Evolution in R

convevol — 1.3

Analysis of Convergent Evolution


Dynamic Assembly Model of Colonization, Local Extinction and Speciation

DDD — 5.0

Diversity-Dependent Diversification

dendextend — 1.15.1

Extending 'dendrogram' Functionality in R

dispRity — 1.6.0

Measuring Disparity

distory — 1.4.4

Distance Between Phylogenetic Histories

diversitree — 0.9-16

Comparative 'Phylogenetic' Analyses of Diversification

entropart — 1.6-8

Entropy Partitioning to Measure Diversity

enveomics.R — 1.8.0

Various Utilities for Microbial Genomics and Metagenomics

evobiR — 1.1

Comparative and Population Genetic Analyses

geiger — 2.0.7

Analysis of Evolutionary Diversification

geomorph — 4.0.1

Geometric Morphometric Analyses of 2D/3D Landmark Data

ggmuller — 0.5.4

Create Muller Plots of Evolutionary Dynamics

ggplot2 — 3.3.5

Create Elegant Data Visualisations Using the Grammar of Graphics

GUniFrac — 1.3

Generalized UniFrac Distances, Distance-Based Multivariate Methods and Feature-Based Univariate Methods for Microbiome Data Analysis

HMPTrees — 1.4

Statistical Object Oriented Data Analysis of RDP-Based Taxonomic Trees from Human Microbiome Data

HyPhy — 1.0

Macroevolutionary phylogentic analysis of species trees and gene trees

idendr0 — 1.5.3

Interactive Dendrograms

ips — 0.0.11

Interfaces to Phylogenetic Software in R

iteRates — 3.1

Parametric rate comparison

kdetrees — 0.1.5

Nonparametric method for identifying discordant phylogenetic trees

markophylo — 1.0.8

Markov Chain Models for Phylogenetic Trees

MCMCglmm — 2.32

MCMC Generalised Linear Mixed Models

metacoder — 0.3.5

Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

metafor — 3.0-2

Meta-Analysis Package for R

MPSEM — 0.3-6

Modeling Phylogenetic Signals using Eigenvector Maps

mvMORPH — 1.1.4

Multivariate Comparative Tools for Fitting Evolutionary Models to Morphometric Data

ouch — 2.17

Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses

OUwie — 2.6

Analysis of Evolutionary Rates in an OU Framework

paleotree — 3.3.25

Paleontological and Phylogenetic Analyses of Evolution

paleoTS — 0.5.2

Analyze Paleontological Time-Series

pastis — 0.1-2

Phylogenetic Assembly with Soft Taxonomic Inferences

PBD — 1.4

Protracted Birth-Death Model of Diversification

PCPS — 1.0.7

Principal Coordinates of Phylogenetic Structure

pegas — 1.0-1

Population and Evolutionary Genetics Analysis System

phangorn — 2.7.1

Phylogenetic Reconstruction and Analysis

phyclust — 0.1-30

Phylogenetic Clustering (Phyloclustering)

phyext2 — 0.0.4

An Extension (for Package 'SigTree') of Some of the Classes in Package 'phylobase'

phylobase — 0.8.10

Base Package for Phylogenetic Structures and Comparative Data

phylocanvas — 0.1.3

Interactive Phylogenetic Trees Using the 'Phylocanvas' JavaScript Library

phyloclim — 0.9.5

Integrating Phylogenetics and Climatic Niche Modeling

PHYLOGR — 1.0.11

Functions for Phylogenetically Based Statistical Analyses

phylogram — 2.1.0

Dendrograms for Evolutionary Analysis

phyloland — 1.3

Modelling Competitive Exclusion and Limited Dispersal in a Statistical Phylogeographic Framework

phylolm — 2.6.2

Phylogenetic Linear Regression

phylotools — 0.2.2

Phylogenetic Tools for Eco-Phylogenetics

phyloTop — 2.1.1

Calculating Topological Properties of Phylogenies

phyreg — 1.0.2

The Phylogenetic Regression of Grafen (1989)

phytools — 0.7-90

Phylogenetic Tools for Comparative Biology (and Other Things)

picante — 1.8.2

Integrating Phylogenies and Ecology

pmc — 1.0.4

Phylogenetic Monte Carlo

rdryad — 1.0.0

Access for Dryad Web Services

rncl — 0.8.4

An Interface to the Nexus Class Library

RNeXML — 2.4.5

Semantically Rich I/O for the 'NeXML' Format

rotl — 3.0.11

Interface to the 'Open Tree of Life' API

Rphylip — 0.1-23

An R interface for PHYLIP

SigTree — 1.10.6

Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree

strap — 1.4

Stratigraphic Tree Analysis for Palaeontology

SYNCSA — 1.3.4

Analysis of Functional and Phylogenetic Patterns in Metacommunities

taxize — 0.9.99

Taxonomic Information from Around the Web

TESS — 2.1.0

Diversification Rate Estimation and Fast Simulation of Reconstructed Phylogenetic Trees under Tree-Wide Time-Heterogeneous Birth-Death Processes Including Mass-Extinction Events

tidytree — 0.3.5

A Tidy Tool for Phylogenetic Tree Data Manipulation

treebase — 0.1.4

Discovery, Access and Manipulation of 'TreeBASE' Phylogenies

treedater — 0.5.0

Fast Molecular Clock Dating of Phylogenetic Trees with Rate Variation

TreePar — 3.3

Estimating birth and death rates based on phylogenies

TreeSim — 2.4

Simulating Phylogenetic Trees

vegan — 2.5-7

Community Ecology Package

windex — 2.0.3

Analysing Convergent Evolution using the Wheatsheaf Index

Task view list