An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
rotl is an R package to interact with the Open Tree of Life data APIs. It was initially developed as part of the NESCENT/OpenTree/Arbor hackathon.
The current stable version is available from CRAN, and can be installed by typing the following at the prompt in R:
Then you can install
library(ghit) # or library(devtools)install_github("ropensci/rotl")
There are three vignettes:
Start by checking out the "How to use
rotl?" by typing:
vignette("how-to-use-rotl", package="rotl") after installing the package.
Then explore how you can use
rotl with other packages to combine your data with trees from the Open Tree of Life project by typing:
The vignette "Using the Open Tree Synthesis in a comparative analsysis" demonstrates how you can reproduce an analysis of a published paper by downloading the tree they used, and data from the supplementary material:
The vignettes are also available from CRAN: How to use
rotl?, Data mashups, and Using the Open Tree synthesis in a comparative analysis.
Taxonomic names are represented in the Open Tree by numeric identifiers, the
ott_ids (Open Tree Taxonomy identifiers). To extract a portion of a tree from the Open Tree, you first need to find
ott_ids for a set of names using the
library(rotl)apes <- c("Pongo", "Pan", "Gorilla", "Hoolock", "Homo")(resolved_names <- tnrs_match_names(apes))
## search_string unique_name approximate_match ott_id is_synonym flags ## 1 pongo Pongo FALSE 417949 FALSE ## 2 pan Pan FALSE 417957 FALSE ## 3 gorilla Gorilla FALSE 417969 FALSE ## 4 hoolock Hoolock FALSE 712902 FALSE ## 5 homo Homo FALSE 770309 FALSE ## number_matches ## 1 2 ## 2 2 ## 3 1 ## 4 1 ## 5 1
Now we can get the tree with just those tips:
tr <- tol_induced_subtree(ott_ids=ott_id(resolved_names))plot(tr)
The code above can be summarized in a single pipe:
library(magrittr)## or expressed as a pipe:c("Pongo", "Pan", "Gorilla", "Hoolock", "Homo") %>%tnrs_match_names %>%ott_id %>%tol_induced_subtree %>%plot
rotl in publications pleases use:
You may also want to cite the paper for the Open Tree of Life
Hinchliff, C. E., et al. (2015). Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences 112.41 (2015): 12764-12769 doi: 10.1073/pnas.1423041112
The manuscript in Methods in Ecology and Evolution includes additional examples on how to use the package. The manuscript and the code it contains are also hosted on GitHub at: https://github.com/fmichonneau/rotl-ms
Starting with v3.0.0 of the package, the major and minor version numbers (the first 2 digits of the version number) will be matched to those of the API. The patch number (the 3rd digit of the version number) will be used to reflect bug fixes and other changes that are independent from changes to the API.
rotl can be used to access other versions of the API (if they are available) but most likely the high level functions will not work. Instead, you will need to parse the output yourself using the "raw" returns from the unexported low-level functions (all prefixed with a
.). For instance to use the
tnrs/match_names endpoint for
v2 of the API:
rotl:::.tnrs_match_names(c("pan", "pango", "gorilla", "hoolock", "homo"), otl_v="v2")
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
tnrs_match_namesare consistent, and remain the same even after using
get_study_subtreegains the argument
tip_labelto control the formatting of the tip labels, #90, reported by @bomeara
is_in_treetakes a list of OTT ids (i.e., the output of
ott_id()), and returns a vector of logical indiicating whether they are included in the synthetic tree (workaround #31).
get_study_subtreeignored the argument
subtree_id, #89 reported by @bomeara
citation("rotl")now includes the reference to the Open Tree of Life publication.
Fix tests and vignette to reflect changes accompanying release 6.1 of the synthetic tree
Add section in vignette "How to use rotl?" about how to get the higher taxonomy from a given taxon.
CITATION file with MEE manuscript information (#82)
rotlnow interacts with v3.0 of the Open Tree of Life APIs. The documentation has been updated to reflect the associated changes. More information about the v3.0 of the Open Tree of Life APIs can be found on their wiki.
ott_id, for objects returned by
tol_mrca(). Each of these methods have their own class.
tax_lineage() to extract the higher taxonomy from an object
taxonomy_taxon_info() (initally suggested by Matt Pennell, #57).
tol_lineage() to extract the nodes towards the root of the tree.
New print methods for
taxon_external_IDs() that return
the external identifiers for a study and associated trees (e.g., DOI, TreeBase
ID); and the identifiers of taxon names in taxonomic databases. The vignette
"Data mashup" includes an example on how to use it.
strip_ott_id() gains the argument
remove_underscores to remove
underscores from tips in trees returned by OTL.
tax_name() for consistency.
Refactor how result of query is checked and parsed (invisible to the user).
Fix bug in
studies_find_studies(), the arguments
only_current has been dropped for the methods associated with
objects returned by
The print method for
tnrs_context() duplicated some names.
synonyms() methods for
not work if the query included unmatched taxa.
Improve warning and format of the result if one of the taxa requested doesn't
In the data frame returned by
tnrs_match_names, the columns
is_deprecated are now
character) [issue #54]
New utility function
strip_ott_ids removes OTT id information from
a character vector, making it easier to match tip labels in trees returned by
tol_induced_subtree to taxonomic names in other data sources. This function
can also remove underscores from the taxon names.
list_trees returns a list of tree ids associated with
studies. The function takes the output of
studies_find_trees gain argument
(default set to
TRUE), that produces a data frame summarizing information
(title of the study, year of publication, DOI, ids of associated trees, ...)
about the studies matching the search criteria.
get_study_tree gains argument
TRUE, if the tree
returned for a given study contains duplicated tip labels, they will be made
unique before being parsed by NCL by appending a suffix (
etc.). (#46, reported by @bomeara)
get_study_year for objects of class
study_meta that returns the
year of publication of the study.
A more robust approach is used by
get_tree_ids to identify the tree ids in
the metadata returned by the API