Last updated on 2019-02-14 by Brian O'Meara
The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on GitHub, R-Forge, or authors' websites.
Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape), though the S4 phylo4 class (implemented in phylobase) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo));"). phylobase and its lighter weight sibling rncl can use the Nexus Class Library to read NEXUS, Newick, and other tree formats. treebase can search for and load trees from the online tree repository TreeBASE, rdryad can pull data from the online data repository Dryad. RNeXML can read, write, and process metadata for the NeXML format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website). phyext2 can read and write various tree formats, including simmap formats. rotl can pull in a synthetic tree and individual study trees from the Open Tree of Life project. The treeio package can read trees in Newick, Nexus, New Hampshire eXtended format (NHX), jplace and Phylip formats and data output from BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes. phylogram can convert Newick files into dendrogram objects. brranching can fetch phylogenies from online repositories, including phylomatic.
Utility functions: These packages include functions for manipulating trees or associated data. ape has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, pulling in data from GenBank, and many more. phylobase has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants). geiger can prune trees and data to an overlapping set of taxa. treeplyr can use dplyr-style functions (filter, mutate, reorder, etc.) on objects consisting of trees plus associated data. tidytree can convert a tree object in to a tidy data frame and has other tidy approaches to manipulate tree data. evobiR can do fuzzy matching of names (to allow some differences). rphast implements an R interface to the PHAST, which can be used for many types of analysis in comparative and evolutionary genomics, such as estimating models of evolution from sequence data, scoring alignments for conservation or acceleration, and predicting elements based on conservation or custom phylogenetic hidden Markov models. SigTree finds branches that are responsive to some treatment, while allowing correction for multiple comparisons. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. apex can handle multiple gene DNA alignments making their use and analysis for tree inference easier in ape and phangorn. aphid can weight sequences based on a phylogeny and can use hidden Markov models (HMMs) for a variety of purposes including multiple sequence alignment.
Ancestral state reconstruction : Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using ape. markophylo can fit a broad set of discrete character types with models that can incorporate constrained substitution rates, rate partitioning across sites, branch-specific rates, sampling bias, and non-stationary root probabilities. phytools can do stochastic character mapping of traits on trees.
Diversification Analysis: Lineage through time plots can be done in ape; nLTT can estimate the normalized lineage through time statistic, which can be used as a summary statistic in ABC approaches. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification). TESS can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in geiger. diversitree implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009). TreePar estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species. caper can do the macrocaic test to evaluate the effect of a a trait on diversity. apTreeshape also has tests for differential diversification (see description). iteRates can identify and visualize areas on a tree undergoing differential diversification. DDD can fit density dependent models as well as models with occasional escape from density-dependence. BAMMtools is an interface to the BAMM program to allow visualization of rate shifts, comparison of diversification models, and other functions. DDD implements maximum likelihood methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, as well as identifies key innovations and simulate a density-dependent process. PBD can calculate the likelihood of a tree under a protracted speciation model. phyloTop has functions for investigating tree shape, with special functions and datasets relating to trees of infectious diseases.
Divergence Times: Non-parametric rate smoothing (NPRS) and penalized likelihood can be implemented in ape. geiger can do congruification to stretch a source tree to match a specified standard tree. treedater implements various clock models, ways to assess confidence, and detecting outliers.
Phylogenetic Inference: UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package ape. phangorn can estimate trees using distance, parsimony, and likelihood. phyclust can cluster sequences. phytools can build trees using MRP supertree estimation and least squares. phylotools can build supermatrices for analyses in other software. pastis can use taxonomic information to make constraints for Bayesian tree searches. outbreaker can infer transmission trees for diseases, as well as other parameters of disease spread. For more information on importing sequence data, see the Genetics task view; pegas may also be of use.
Time series/Paleontology: Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using paleoTS. strap can do stratigraphic analysis of phylogenetic trees.
Tree Simulations: Trees can be simulated using constant-rate birth-death with various constraints in TreeSim and a birth-death process in geiger. Random trees can be generated in ape by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees). paleotree can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa. TESS can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.
Independent contrasts for continuous characters can be calculated using
ape, picante, or caper (which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in geiger. corHMM can look for hidden rates in discrete traits as well as fit correlational models for two or three binary traits (similar to Pagel's old Discrete program) and complex models for multistate traits (similar to Pagel's old Multistate program). Brownian motion models can be fit in geiger, ape, and paleotree. ratematrix can fit univariate or multivariate Brownian motion models with one or more rate regimes. Deviations from Brownian motion can be investigated in geiger and OUwie. mvMORPH can fit Brownian motion, early burst, ACDC, OU, and shift models to univariate or multivariate data. Ornstein-Uhlenbeck (OU) models can be fitted in geiger, ape,
ouch (with multiple means), and OUwie (with multiple means, rates, and attraction values). surface wraps ouch to infer shifts in the OU optimum; bayou also allows data-driven selection between different OU models. geiger fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in
geiger. Multiple-rate Brownian motion can be fit
Trait Simulations : Continuous traits can be simulated using brownian motion in ouch, geiger, ape, picante, OUwie, and caper, the Hansen model (a form of the OU) in ouch and OUwie and a speciational model in geiger. Discrete traits can be simulated using a continuous time Markov model in geiger. phangorn can simulate DNA or amino acids. Both discrete and continuous traits can be simulated under models where rates change through time in geiger. phytools can simulate discrete characters using stochastic character mapping. phylolm can simulate continuous or binary traits along a tree.
Tree Manipulation : Branch length scaling using ACDC; Pagel's (1999) lambda, delta and kappa parameters; and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in geiger. phytools also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using ape. Extinct taxa can be pruned using geiger. phylobase offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with phangorn. paleotree has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more. enveomics.R can prune a tree to keep clade representatives.
Community/Microbial Ecology: picante, vegan, SYNCSA, phylotools, PCPS, caper, DAMOCLES integrate several tools for using phylogenetics with community ecology. HMPTrees and GUniFrac provide tools for comparing microbial communities. betapart allows computing pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity. adiv can calculate various indices of biodiversity including species, functional and phylogenetic diversity, as well as alpha, beta, and gamma diversities. entropart can measure and partition diversity based on Tsallis entropy as well as calculate alpha, beta, and gamma diversities. ecospat can also examine phylogenetic diversity. metacoder is an R package for handling large taxonomic data sets, like those generated from modern high-throughput sequencing, like metabarcoding.
Phyloclimatic Modeling: phyloclim integrates several new tools in this area.
Phylogeography / Biogeography: phyloland implements a model of space colonization mapped on a phylogeny, it aims at estimating limited dispersal and competitive exclusion in a statistical phylogeographic framework. jaatha can infer demographic parameters for two species with multiple individuals per species. diversitree implements the GeoSSE method for diversification analyses based on two areas. nodiv can compare sister species distributions at each node to detect major differences in distribution (Borregaard et al., 2014).
Species/Population Delimitation: adhoc can estimate an ad hoc distance threshold for a reference library of DNA barcodes.
Tree Plotting and Visualization:
User trees can be plotted using ape, adephylo, phylobase, phytools, ouch, and dendextend; several of these have options for branch or taxon coloring based on some criterion (ancestral state, tree structure, etc.). paleoPhylo and paleotree are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using ape. Ancestral state reconstructions can be visualized along branches using ape and paleotree. phytools can project a tree into a morphospace. BAMMtools can visualize rate shifts calculated by BAMM on a tree. The popular R visualization package ggplot2 can be extended by
Tree Comparison: Tree-tree distances can be evaluated, and used in additional analyses, in distory and Rphylip. ape can compute tree-tree distances and also create a plot showing two trees with links between associated tips. kdetrees implements a non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees, which could represent inference problems or processes such as horizontal gene transfer. dendextend can evaluate multiple measures comparing dendrograms.
Taxonomy: taxize can interact with a suite of web APIs for taxonomic tasks, such as verifying species names, getting taxonomic hierarchies, and verifying name spelling. evobiR contains functions for making a tree at higher taxonomic levels, downloading a taxonomy tree from NCBI or ITIS, and various other miscellaneous functions (simulations of character evolution, calculating D-statistics, etc.).
Gene tree - species tree: HyPhy can count the duplication and loss cost to reconcile a gene tree to a species tree. It can also sample histories of gene trees from within family trees. rmetasim can simulate loci and individuals across landscapes using the metasim simulation engine.
Interactions with other programs: geiger can call PATHd8 through its congruify function. ips wraps several tree inference and other programs, including MrBayes, Beast, and RAxML, allowing their easy use from within R. Rphylip wraps PHYLIP, a broad variety of programs for tree inference under parsimony, likelihood, and distance, bootstrapping, character evolution, and more. BoSSA can use information from various tools to place a query sequence into a reference tree. pastis can use taxonomic information to make constraints for MrBayes tree searches.
Notes: At least ten packages start as phy* in this domain, including two pairs of similarly named packages (phytools and phylotools, phylobase and phybase). This can easily lead to confusion, and future package authors are encouraged to consider such overlaps when naming packages. For clarification, phytools provides a wide array of functions, especially for comparative methods, and is maintained by Liam Revell; phylotools has functions for building supermatrices and is maintained by Jinlong Zhang. phylobase implements S4 classes for phylogenetic trees and associated data and is maintained by Francois Michonneau; phybase has tree utility functions and many functions for gene tree - species tree questions and is authored by Liang Liu, but no longer appears on CRAN.