Interacts with a suite of web 'APIs' for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more.
taxize
allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
The taxize
tutorial is can be found at https://ropensci.org/tutorials/taxize.html
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes
. For example, gnr_resolve
uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification
.
You need API keys for Encyclopedia of Life (EOL), Tropicos, IUCN, and NatureServe.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: Pan-European Species directories Infrastructure and Mycobank. Data sources that use SOAP web services have been moved to taxizesoap
at https://github.com/ropensci/taxizesoap.
taxize
Souce | Function prefix | API Docs | API key |
---|---|---|---|
Encylopedia of Life | eol |
link | link |
Taxonomic Name Resolution Service | tnrs |
"api.phylotastic.org/tnrs" | none |
Integrated Taxonomic Information Service | itis |
link | none |
Global Names Resolver | gnr |
link | none |
Global Names Index | gni |
link | none |
IUCN Red List | iucn |
link | link |
Tropicos | tp |
link | link |
Theplantlist dot org | tpl |
** | none |
Catalogue of Life | col |
link | none |
National Center for Biotechnology Information | ncbi |
none | none |
CANADENSYS Vascan name search API | vascan |
link | none |
International Plant Names Index (IPNI) | ipni |
link | none |
Barcode of Life Data Systems (BOLD) | bold |
link | none |
National Biodiversity Network (UK) | nbn |
link | none |
Index Fungorum | fg |
link | none |
EU BON | eubon |
link | none |
Index of Names (ION) | ion |
link | none |
Open Tree of Life (TOL) | tol |
link | none |
World Register of Marine Species (WoRMS) | worms |
link | none |
NatureServe | natserv |
link | link |
Wikipedia | wiki |
link | none |
Kew's Plants of the World | pow |
none | none |
**: There are none! We suggest using TPL
and TPLck
functions in the taxonstand package. We provide two functions to get bulk data: tpl_families
and tpl_get
.
***: There are none! The function scrapes the web directly.
See the newdatasource tag in the issue tracker
For more examples see the tutorial
install.packages("taxize")
Windows users install Rtools first.
install.packages("remotes")remotes::install_github("ropensci/taxize")
library('taxize')
Alot of taxize
revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data source knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)lapply(out, head)#> $`315576`#> name rank id#> 1 cellular organisms no rank 131567#> 2 Eukaryota superkingdom 2759#> 3 Opisthokonta no rank 33154#> 4 Metazoa kingdom 33208#> 5 Eumetazoa no rank 6072#> 6 Bilateria no rank 33213#> #> $`492549`#> name rank id#> 1 cellular organisms no rank 131567#> 2 Eukaryota superkingdom 2759#> 3 Opisthokonta no rank 33154#> 4 Metazoa kingdom 33208#> 5 Eumetazoa no rank 6072#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')#> $Salmo#> childtaxa_id childtaxa_name childtaxa_rank#> 1 2304090 Salmo abanticus species#> 2 2126688 Salmo ciscaucasicus species#> 3 1509524 Salmo marmoratus x Salmo trutta species#> 4 1484545 Salmo cf. cenerinus BOLD:AAB3872 species#> 5 1483130 Salmo zrmanjaensis species#> 6 1483129 Salmo visovacensis species#> 7 1483128 Salmo rhodanensis species#> 8 1483127 Salmo pellegrini species#> 9 1483126 Salmo opimus species#> 10 1483125 Salmo macedonicus species#> 11 1483124 Salmo lourosensis species#> 12 1483123 Salmo labecula species#> 13 1483122 Salmo farioides species#> 14 1483121 Salmo chilo species#> 15 1483120 Salmo cettii species#> 16 1483119 Salmo cenerinus species#> 17 1483118 Salmo aphelios species#> 18 1483117 Salmo akairos species#> 19 1201173 Salmo peristericus species#> 20 1035833 Salmo ischchan species#> 21 700588 Salmo labrax species#> 22 602068 Salmo caspius subspecies#> 23 237411 Salmo obtusirostris species#> 24 235141 Salmo platycephalus species#> 25 234793 Salmo letnica species#> 26 62065 Salmo ohridanus species#> 27 33518 Salmo marmoratus species#> 28 33516 Salmo fibreni species#> 29 33515 Salmo carpio species#> 30 8032 Salmo trutta species#> 31 8030 Salmo salar species#> #> attr(,"class")#> [1] "children"#> attr(,"db")#> [1] "ncbi"
Get all species in the genus Apis
downstream(as.tsn(154395), db = 'itis', downto = 'species', verbose = FALSE)#> $`154395`#> tsn parentname parenttsn taxonname rankid rankname#> 1 154396 Apis 154395 Apis mellifera 220 species#> 2 763550 Apis 154395 Apis andreniformis 220 species#> 3 763551 Apis 154395 Apis cerana 220 species#> 4 763552 Apis 154395 Apis dorsata 220 species#> 5 763553 Apis 154395 Apis florea 220 species#> 6 763554 Apis 154395 Apis koschevnikovi 220 species#> 7 763555 Apis 154395 Apis nigrocincta 220 species#> #> attr(,"class")#> [1] "downstream"#> attr(,"db")#> [1] "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)#> $`Pinus contorta`#> tsn parentname parenttsn taxonname rankid rankname#> 1 18031 Pinaceae 18030 Abies 180 genus#> 2 18033 Pinaceae 18030 Picea 180 genus#> 3 18035 Pinaceae 18030 Pinus 180 genus#> 4 183396 Pinaceae 18030 Tsuga 180 genus#> 5 183405 Pinaceae 18030 Cedrus 180 genus#> 6 183409 Pinaceae 18030 Larix 180 genus#> 7 183418 Pinaceae 18030 Pseudotsuga 180 genus#> 8 822529 Pinaceae 18030 Keteleeria 180 genus#> 9 822530 Pinaceae 18030 Pseudolarix 180 genus#> #> attr(,"class")#> [1] "upstream"#> attr(,"db")#> [1] "itis"
synonyms("Acer drummondii", db="itis")#> $`Acer drummondii`#> sub_tsn acc_name acc_tsn#> 1 183671 Acer rubrum var. drummondii 526853#> 2 183671 Acer rubrum var. drummondii 526853#> 3 183671 Acer rubrum var. drummondii 526853#> acc_author syn_author#> 1 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) E. Murray#> 2 (Hook. & Arn. ex Nutt.) Sarg. Hook. & Arn. ex Nutt.#> 3 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) Small#> syn_name syn_tsn#> 1 Acer rubrum ssp. drummondii 28730#> 2 Acer drummondii 183671#> 3 Rufacer drummondii 183672#> #> attr(,"class")#> [1] "synonyms"#> attr(,"db")#> [1] "itis"
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)#> $itis#> Salvelinus fontinalis #> "162003" #> attr(,"match")#> [1] "found"#> attr(,"multiple_matches")#> [1] FALSE#> attr(,"pattern_match")#> [1] FALSE#> attr(,"uri")#> [1] "https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=162003"#> attr(,"class")#> [1] "tsn"#> #> $ncbi#> Salvelinus fontinalis #> "8038" #> attr(,"class")#> [1] "uid"#> attr(,"match")#> [1] "found"#> attr(,"multiple_matches")#> [1] FALSE#> attr(,"pattern_match")#> [1] FALSE#> attr(,"uri")#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/8038"#> #> attr(,"class")#> [1] "ids"
You can limit to certain rows when getting ids in any get_*()
functions
get_ids(names="Poa annua", db = "gbif", rows=1)#> $gbif#> Poa annua #> "2704179" #> attr(,"class")#> [1] "gbifid"#> attr(,"match")#> [1] "found"#> attr(,"multiple_matches")#> [1] TRUE#> attr(,"pattern_match")#> [1] FALSE#> attr(,"uri")#> [1] "http://www.gbif.org/species/2704179"#> #> attr(,"class")#> [1] "ids"
Furthermore, you can just back all ids if that's your jam with the get_*_()
functions (all get_*()
functions with additional _
underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)#> $nbn#> $nbn$`Chironomus riparius`#> guid scientificName rank taxonomicStatus#> 1 NBNSYS0000027573 Chironomus riparius species accepted#> 2 NHMSYS0001718585 Hypnoidus riparius species accepted#> 3 NBNSYS0000023345 Paederus riparius species accepted#> #> $nbn$`Pinus contorta`#> guid scientificName rank taxonomicStatus#> 1 NBNSYS0000004786 Pinus contorta species accepted#> 2 NHMSYS0000494858 Pinus contorta var. murrayana variety accepted#> 3 NHMSYS0000494848 Pinus contorta var. contorta variety accepted#> #> #> attr(,"class")#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')#> $`Helianthus annuus`#> [1] "common sunflower" "sunflower" "wild sunflower" #> [4] "annual sunflower"
comm2sci("black bear", db = "itis")#> $`black bear`#> [1] "Ursus americanus luteolus" "Ursus americanus" #> [3] "Ursus americanus" "Ursus americanus americanus"#> [5] "Chiropotes satanas" "Ursus thibetanus" #> [7] "Ursus thibetanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")lowest_common(spp, db = "ncbi")#> name rank id#> 21 Boreoeutheria below-class 1437010
numeric
to uid
as.uid(315567)#> [1] "315567"#> attr(,"class")#> [1] "uid"#> attr(,"match")#> [1] "found"#> attr(,"multiple_matches")#> [1] FALSE#> attr(,"pattern_match")#> [1] FALSE#> attr(,"uri")#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"
list
to uid
as.uid(list("315567", "3339", "9696"))#> [1] "315567" "3339" "9696" #> attr(,"class")#> [1] "uid"#> attr(,"match")#> [1] "found" "found" "found"#> attr(,"multiple_matches")#> [1] FALSE FALSE FALSE#> attr(,"pattern_match")#> [1] FALSE FALSE FALSE#> attr(,"uri")#> [1] "https://www.ncbi.nlm.nih.gov/taxonomy/315567"#> [2] "https://www.ncbi.nlm.nih.gov/taxonomy/3339" #> [3] "https://www.ncbi.nlm.nih.gov/taxonomy/9696"
out <- as.uid(c(315567, 3339, 9696))(res <- data.frame(out))#> ids class match multiple_matches pattern_match#> 1 315567 uid found FALSE FALSE#> 2 3339 uid found FALSE FALSE#> 3 9696 uid found FALSE FALSE#> uri#> 1 https://www.ncbi.nlm.nih.gov/taxonomy/315567#> 2 https://www.ncbi.nlm.nih.gov/taxonomy/3339#> 3 https://www.ncbi.nlm.nih.gov/taxonomy/9696
See our CONTRIBUTING document.
Alphebetical
Collected via GitHub Issues - this list honors all contributions, whether code or not.
Alphebetical
afkoeppel - ahhurlbert - albnd - Alectoria - andzandz11 - antagomir - arendsee - ArielGreiner - arw36 - ashenkin - ashiklom - benjaminschwetz - benmarwick - bomeara - bw4sz - cboettig - cdeterman - ChrKoenig - chuckrp - clarson2191 - claudenozeres - cmzambranat - cparsania - daattali - DanielGMead - DarrenObbard - davharris - davidvilanova - diogoprov - dlebauer - dlenz1 - dschlaep - EDiLD - edwbaker - emhart - eregenyi - fdschneider - fgabriel1891 - fischhoff - fmichonneau - fozy81 - gedankenstuecke - GISKid - git-og - glaroc - gpli - gustavobio - hlapp - ibartomeus - Ironholds - jangorecki - jarioksa - jebyrnes - jimmyodonnell - johnbaums - jonmcalder - josephwb - jsgosnell - jwilk - kamapu - karthik - katrinleinweber - KevCaz - kgturner - kmeverson - Koalha - ljvillanueva - maelle - Markus2015 - mcsiple - MikkoVihtakari - millerjef - miriamgrace - MK212 - mpnelsen - MUSEZOOLVERT - nate-d-olson - nmatzke - npch - paternogbc - patperu - pederengelstad - philippi - pmarchand1 - PrincessPi314 - pssguy - raredd - rec3141 - Rekyt - RodgerG - rossmounce - sariya - scelmendorf - sckott - SimonGoring - snsheth - snubian - Squiercg - taddallas - tdjames1 - tmkurobe - toczydlowski - tpaulson1 - tpoisot - vijaybarve - wcornwell - willpearse - wpetry - yhg926 - zachary-foster
Check out our milestones to see what we plan to get done for each version.
taxize
in R doing citation(package = 'taxize')
class2tree()
gains node labels when present (#644) (#748) thanks @gpliget_pow()
, get_pow_()
, as.pow()
, classification.pow()
, pow_search()
, and pow_lookup()
(#598) (#739)taxize
. the string will look something like r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6)
, including the versions of the curl
R pkg, the crul
package, and the taxize
package (#662)get_colid
functionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743)get_*
functions: in some of the get_*
functions we tried for a direct match (e.g., "Poa" == "Poa"
) and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across all get_*
functions. Now all get_*
functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)taxize-authentication
manual file covering authentication information across the package (#681)gnr_resolve()
docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737)get_eolid()
fixes: gains new attribute pageid
; uri
's given are updated to EOL's new URL format; rank
and datasource
parameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)col_search()
now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the searchgnr_datasources()
loses the todf
parameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versionsget_wormsid()
, was failing when there was a direct match found with more than 1 result (#740)get_*
functions: linting of the input to the rows
parmeter was failing with a vector of values in some cases (#741)iucn_summary()
; we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the reportiucn_summary_id()
is defunct, use iucn_summary()
insteadcol_downstream()
gains parameter extant_only
(logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquirydownstream()
gains another db
options: Worms. You can now set db="worms"
to use Worms to get taxa downstream from a target taxon. In addition, taxize
gains new function worms_downstream()
, which is used under the hood in downstream(..., db="worms")
(#713) (#715)id2name()
with db
options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the get_*()
family of functions. (#712) (#716)tax_rank()
gains new parameter rows
so that one can pass rows
down to get_*()
functionssynonyms()
warning from an internal cbind()
call now fixed (#704) (#705) thanks @vijaybarvetaxize
function calls thrown when notifying users about API keys (e.g., taxize::use_tropicos()
) to make it very clear where the functions live (to avoid confusion with usethis
) (#724) (#725) thanks @maelleiucn_summary()
to output the same structure when no match is found as when a match is found so that when output is passed to iucn_status()
behavior is the same (#708) thanks @Rekyttax_name()
tests on CRAN (#728)httr
replaced by crul
throughout (#590)vcr
, making tests much faster and not prone to errors to remote services being down (#729)eol_dataobjects()
gains new parameter language
. eol_pages()
loses iucn
, images
, videos
, sounds
, maps
, and text
parameters, and gains images_per_page
, videos_per_page
, sounds_per_page
, maps_per_page
, texts_per_page
, and texts_page
. Please do let us know if you find any problems with any EOL functions (#717) (#718)db
value for comm2sci()
and sci2comm()
is now ncbi
instead of eol
get_*()
functions changed parameter verbose
to messages
to not conflict with verbose
passed down to crul::HttpClient
ncbi_ping()
reworked to allow use of your api key as a parameter or pulled from your environemnt; eol_ping()
using https instead of http, and parsing JSON instead of XML.get_eolid()
was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallasget_tolid()
was erroring when values were NULL
- now replacing all NULLL
with NA_character_
to make data.table::rbindlist()
happy (#710) (#711) thanks @gpli for the fixrank_ref
data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)downstream()
and ncbi_get_taxon_summary()
: change in ncbi_get_taxon_summary
to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetzuse_entrez()
, use_eol()
, use_iucn()
(which uses internally rredlist::rl_use_iucn()
), and use_tropicos()
(#682) (#691) (#693) By @maelletropicos_ping()
downstream()
and gbif_downstream()
: some of the results don't have a canonicalName
, so now safely try to get that field (#673)as.uid()
, was erroring when passing in a taxon ID (#674) (#675) by @zachary-fosterget_boldid()
(and by extension classification(..., db = "bold")
): was failing when no parent taxon found, just fill in with NA now (#680)synonyms()
: was failing for some TSNs for db="itis"
(#685)tax_name()
: rows
arg wasn't being passed on internally (#686)gnr_resolve()
and gnr_datasources()
: problems were caused by http scheme, switched to use https instead of http (#687)class2tree()
: organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gplincbi_get_taxon_summary()
: changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in downstream()
/ncbi_downstream()
/ncbi_children()
(#698)class2tree()
: was erroring when name strings contained pound signs (e.g., #
) (#699) (#700) thanks @gpliSys.sleep
for NCBI requests if the user has an API key (#667)?taxize-authentication
verbose
to messages
across the package so that supressing calls to message()
do not conflict with curl options passed ingenbank2uid()
and ncbi_get_taxon_summary()
to use crul
instead of httr
for HTTP requestsget_tolid()
: it was missing assignment of the att
attribute internally, causing failures in some cases (#663) (#672)ncbi_children()
(and thus children()
when requesting NCBI data) to not fail when there is an empty result from the internal call to classification()
(#664) thanks @arendseeclass2tree()
gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)?taxize-authentication
for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)crul
and zoo
downstream()
we now pass on limit
and start
parameters to gbif_downstream()
; we weren't doing that before; the two parameters control pagination (#638)genbank2uid()
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-fosterchildren()
outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendseedownstream()
by passing ...
(additional parameters) down to ncbi_children()
used internally. allows e.g., use of ambiguous
parameter in ncbi_children()
allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendseehttr
for crul
in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package crul
for help on curl options. Along with this change, the parameter verbose
has changed to messages
(for toggling printing of information messages)CONTRIBUTING.md
file for how to contribute to the test suite (#635)genbank2uid
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.downstream()
: passing numeric taxon ids to the function while using db="ncbi"
wasn't working (#641) thanks @arendseechildren()
: passing numeric taxon ids to the function while using db="worms"
wasn't working (#650) (#651) thanks @arendseesynonyms_df()
- that attemps to combine many outputs from the synonyms()
function - now removes NA/NULL/empy outputs before attempting the combination (#636)gnr_resolve()
: before if preferred_data_sources
was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)children()
. It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-fosterget_*()
functionsget*()
functions had NaN
as default rows
parameter
value. Those all changed to NA
rows
parameter value givenget_*()
functionsget_*()
functions to behave the
same when ask = FALSE, rows = 1
and ask = TRUE, rows = 1
as these
should result in the same outcome. (#627) thanks @zachary-foster !NA
with no inication that there were multiple matches.comm2sci()
to S3 setup with methods for character
, uid
,
and tsn
(#621)iucn_status()
now has S3 setup with a single method that only handles
output from the iucn_summary()
function.key
parameter to fxn iucn_id()
(#633)sci2comm()
: to indicate how to get non-simplified
output (which includes what language the common name is from) vs.
getting simplified output (#623) thanks @glaroc !sci2comm()
to not be case sensitive when looking for matches
(#625) thanks @glaroc !eol_search()
: link
and content
eol_search()
to describe returned data.frame
bold_bing()
to use new base URL for their APIrank_ref
, see ?rank_ref
downstream()
via fix to rank_ref
dataset to include
"infraspecies" and make "unspecified" and "no rank" requivalent.
Fix to col_downstream()
to remove properly ranks lower than
allowed. (#620) thanks @cdeterman !iucn_summary
: changed to using rredlist
package internally.
sciname
param changed to x
. iucn_summary_id()
now is
deprecated in favor of iucn_summary()
. iucn_summary()
now has a
S3 setup, with methods for character
and iucn
(#622)rank_ref
dataset as that rank sometimes used
at NCBI (from bug reported in ncbi_downstream()
) (#626)sci2comm()
, add tryCatch()
to internals to catch
failed requests for specific pageid's (#624) thanks @glaroc !get_nbnid()
(#632)ape::neworder_phylo
object, which is not used anymore in taxize
ncbi_downstream()
and now NCBI is an option in
the function downstream()
(#583) thanks for the push @andzandz11wikitaxa
, with contributions from @ezwelty (#317)scrapenames()
gains a parameter return_content
, a boolean, to
optionally return the OCR content as a text string with the results. (#614)
thanks @fgabriel1891get_iucn()
- to get IUCN Red List ids for taxa. In addition,
new S3 methods synonyms.iucn
and sci2comm.iucn
- no other methods could
be made to work with IUCN Red List ids as they do no share their taxonomic
classification data (#578) thanks @diogoprovbold
now an option in classification()
function (#588)genbank2uid()
can give back more than 1 taxon matched to a given
Genbank accession number. Now the function can return more than one
match for each query, e.g., try genbank2uid(id = "AM420293")
(#602)
thanks @sariyacbind()
usage to incclude ...
for method
consistency (#612)tax_rank()
used to be able to do only ncbi and itis. Can now do a
lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn,
worms, natserv, bold (#587)classification()
docs in a section Lots of results
a
note about how to deal with results when there are A LOT of them. (#596)
thanks @ahhurlbert for raising the issuetnrs()
now returns the resulting data.frame in the oder of the
names passed in by the user (#613) thanks @wpetrygnr_resolve()
to now strip out taxonomic names submitted
by user that are NA, or zero length strings, or are not of class
character (#606)gnr_resolve()
(#610) thanks @kamaputnrs()
docs that the service doesn't provide any
information about homonyms. (#610) thanks @kamapuparvorder
to the taxize
rank_ref
dataset - used by NCBI -
if tax returned with that rank, some functions in taxize
were failing
due to that rank missing in our reference dataset rank_ref
(#615)get_colid()
via problem in parsing within col_search()
(#585)gbif_downstream
(and thus fix in downstream()
): there
was two rows with form in our rank_ref
reference dataset of rank names,
causing > 1 result in some cases, then causing vapply
to fail as it's
expecting length 1 result (#599) thanks @andzandz11genbank2uid()
: was failing when getting more than 1 result back,
works now (#603) and fails better now, giving back warnings/error messages
that are more informative (see also #602) thanks @sariyasynonyms.tsn()
: in some cases a TSN has > 1 accepted name. We
get accepted names first from the TSN, then look for synonyms, and hadn't
accounted for > 1 accepted name. Fixed now (#607) thanks @tdjamessci2comm()
- was not dealing internally with passing
the simplify
parameter (#616)worrms
package on CRAN.
Adds functions as.wormsid()
, get_wormsid()
, get_wormsid_()
,
children.wormsid()
, classification.wormsid()
, sci2comm.wormsid()
,
comm2sci.wormsid()
, and synonyms.wormsid()
(#574) (#579)as.natservid
,
get_natservid
, get_natservid_
, and classification.natservid
(#126)rankagg()
with respect to vegan
package to work with
older and new version of vegan
- thank @jarioksa (#580) (#581)get_tolid()
, get_tolid_()
, and as.tolid()
(#517)classification()
gains new method for TOL datalowest_common()
gains new method for TOL dataritis
package, an external dependency for ITIS taxonomy
data. Note that a large number of ITIS functions were removed, and are
now available via the package ritis
. However, there are still many
high level functions for working with ITIS data (see functions prefixed
with itis_
), and get_tsn()
, classification.tsn()
, and similar
high level functions remain unchanged. (#525)eubon()
fxn is now eubon_search()
, although either still
work - though eubon()
will be made defunct in the next version of
this package. Additional new functions were added: eubon_capabilities()
,
eubon_children()
, and eubon_hierarchy()
(#567)lowest_common()
function gains two new data source options: COL (Catalogue
of Life) and TOL (Tree of Life) (#505)synonyms_df()
as a slim wrapper around
data.table::rbindlist()
to make it easy to combine many outputs
from synonyms()
for a single data source - there is a lot of heterogeneity
among data sources in how they report synonyms data, so we don't attempt
to combine data across sources (#533)https
from http
(#571)tax_name()
in which when an invalid taxon was searched
for then classification()
returned no data and caused an error.
Fixed now. (#560) thanks @ljvillanueva for reporting it!gnr_resolve()
in which order of input names to the function
was not retained. fixed now. (#561) thanks @bomeara for reporting it!gbif_parse()
- data format changed coming back from
GBIF - needed to replace NULL
with NA
(#568) thanks @ChrKoenig for
reporting it!get_*()
functions now have new attributes to further help the user:
multiple_matches
(logical) indicating whether there were multiple
matches or not, and pattern_match
(logical) indicating whether a
pattern match was made, or not. (#550) from (#547) discussion,
thanks @ahhurlbert ! see also (#551)xml2::xml_find_one()
to xml2::xml_find_first()
for new xml2
version (#546)gnr_resolve()
now retains user supplied taxa that had no matches -
this could affect your code, make sure to check your existing code (#558)gnr_resolve()
- stop sorting output data.frame, so order of rows
in output data.frame now same as user input vector/list (#559)sub_rows()
inside of most get_*()
functions
to not fail when the data.frame rows were less than that requested by
the user in rows
parameter (#556)get_gbifid()
, as sometimes calls failed because we now
return numberic IDs but used to return character IDs (#555)get_()
functions to call the internal sub_rows()
function later in the function flow so as not to interfere with
taxonomic based filtering (e.g., user filtering by a taxonomic rank)
(#555)gnr_resolve()
, to not fail on parsing when no data
returned when a preferred data source specified (#557)iucn_summary()
(#543) thanks @mcsiplencbi_get_taxon_summary()
suggesting to break up the ids into chunks (#541) thanks @daattaliitis_acceptname()
to accept multiple names (#534) and now
gives back same output regardless of whether match found or not (#531)tax_name()
for some queries that return no classification data
via internal call to classification()
(#542) thanks @daattalitax_name()
(#530) thanks @ibartomeusrankagg()
function, use requireNamespace()
in examples
to make sure user has vegan
installed (#529)eol_invasive()
and gisd_invasive()
to point to new location in the originr
package. Also, cleaned out code in those functions as not avail.
anymore (#494)get_gbifid()
to use new internal code to provide two
ways to search GBIF taxonomy API, either via /species/match
or via
/species/search
, instead of /species/suggest
, which we used previously.
The suggest route was too coarse. get_gbifid()
also gains a parameter
method
to toggle whether you search for names using /species/match
or
/species/search
. (#528)col_search()
to handle when COL can return a value of
missapplied name
, which a switch()
statement didn't handle yet (#511)
thanks @JoStaerk !get_colid()
and col_search()
(#523) thanks @zachary-foster !bold
, which fixes
taxize::bold_search()
, so no actual changes in taxize
for
this, but take note (#521)gnr_resolve()
where we indexed to data
incorrectly. And added tests to account for this problem.
Thanks @raredd ! (#519) (#520)iucn_summary()
introduced in last version.
iucn_summary()
now uses the package rredlist
, which requires
an API key, and I didn't document how to use the key. Function
now allows user to pass the key in as a parameter, and documents
how to get a key and save it in either .Renviron
or in
.Rprofile
(#522)lowest_common()
for obtaining the lowest common taxon and
rank for a given taxon name or ID. Methods so far for ITIS, NCBI, and GBIF (#505)rredlist
rredlistiucn_summary_id()
- same as iucn_summary()
, except takes
IUCN IDs as input instead of taxonomic names (#493)iucn_summary()
fixes, long story short: a number of bug fixes, and uses
the new IUCN API via the newish package rredlist
when IDs are given as input,
but uses the old IUCN API when taxonomic names given. Also: gains new parameter distr_details
(#174) (#472) (#487) (#488)XML
with xml2
for XML parsing (#499)httr::content
to explicitly state encoding="UTF-8"
(#498)gnr_resolve()
now outputs a column (user_supplied_name
) for the exact input taxon
name - facilitates merging data back to original data inputs (#486) thanks @Alectoriaeol_dataobjects()
gains new parameter taxonomy
to toggle whether to return
any taxonomy details from different data providers (#497)classification()
was giving back rank values in mixed case from different data
providers (e.g., class
vs. Class
). All rank values are now all lowercase (#504)get_gbfid
to
50 from 20. Gives back more results, so more likely to get the thing searched for (#513)gni_search()
to make all output columns character
classiucn_id()
, tpl_families()
, and tpl_get()
all gain a new parameter ...
to
pass on curl options to httr::GET()
get_eolid()
: URI returned now always has the pageid, and goes to the
right place; API key if passed in now actually used, woopsy (#484)get_uid()
: when a taxon not found, the "match" attribute was saying
found sometimes anyway - that is now fixed; additionally, fixed docs to correctly
state that we give back 'NA due to ask=FALSE'
when ask = FALSE
(#489) Additionally,
made this doc fix in other get_*()
function docsapgOrders()
function (#490)tp_search()
which fixes get_tpsid()
: Tropicos doesn't allow periods (.
) in
query strings, so those are URL encoded now; Tropicos doesn't like sub-specific rank names
in name query strings, so we warn when those are found, but don't alter user inputs; and
improved docs to be more clear about how the function fails (#491) thanks @scelmendorf !classification(db = "itis")
to fail better when no taxa found (#495) thanks @ashenkin !eol_pages()
fixes: the EOL API route for this method gained a new parameter taxonomy
,
this function gains that parameter. That change caused this fxn to fail. Now fixed. Also,
parameter subject
changed to subjects
(#500)col_search()
due to when misapplied name
come back as a data slot. There
was previously no parser for that type. Now there is, and it works (#512)R >= 3.2.1
. Good idea to update your R installation anyway (#476)ion()
for obtaining data from Index of Organism Names (#345)eubon()
for obtaining data from EU (European Union) BON
taxonomy (#466) Note that you may onloy get partial results for some requests
as paging isn't implemented yet in the EU BON API (#481)fg_*()
for obtaining data from Index
Fungorum. More work has to be done yet on this data source, but these initial
functions allow some Index Fungorum data access (#471)gbif_downstream()
for obtaining downstream names from
GBIF's backbone taxonomy. Also available in downstream()
, where you can
request downstream names from GBIF, along with other data sources (#414)db
parameters to warn users that if they
provide the wrong db
value for the given taxon ID, they can get data
back, but it would be wrong. That is, all taxonomic data sources available
in taxize
use their own unique IDs, so a single ID value can be in multiple
data sources, even though the ID refers to different taxa in each data source.
There is no way we can think of to prevent this from happening, so be cautious.
(#465)gnr_resolve()
to by default capitalize first name of a name string
passed to the function. GNR is case sensitive, so case matters (#469)phylomatic_tree()
and phylomatic_format()
are defunct. They were deprecated
in recent versions, but are now gone. See the new package brranching
for
Phylomatic data (#479)stripauthority
argument in gnr_resolve()
has been renamed to canonical
to better match what it actually does (#451)gnr_resolve()
now returns a single data.frame in output, or NULL
when no data found. The input taxa that have no match at all are returned in
an attribute with name not_known
(#448)vascan_search()
changed callopts
parameter to ...
to pass in curl
options to the request.ipni_search()
changed callopts
parameter to ...
to pass in curl
options to the request. In addition, better http error handling, and
added a test suite for this function. (#458)stringsAsFactors=FALSE
now used for gibf_parse()
(https://github.com/ropensci/taxize/commit/c0c4175d3a0b24d403f18c057258b67d3fbf17f0)get_uid()
to make more clear
how to use the varoious parameters to get the desired result, and how to
avoid certain pitfalls (#436)asdf
from the function eol_dataobjects()
- now
returning data.frame's only.get_eolid()
via tryCatch()
to fail better
when names not found.openssl
as a package dependency. Not needed anymore because uBio
dropped.gnr_resolve()
failed when no canonical form was found.gnr_resolve()
when no results found when best_match_only=TRUE
(#432)itisdf()
to give back an empty data.frame
when no results found, often with subspecific taxa. Helps solve errors reported
in use of downstream()
, itis_downstream()
, and gethierarchydownfromtsn()
(#459)gnr_resolve()
gains new parameter with_canonical_ranks
(logical) to choose
whether infraspecific ranks are returned or not.iucn_id()
to get the IUCN ID for a taxon from it's name. (#431)ubio_classification()
, ubio_classification_search()
,
ubio_id()
, ubio_search()
, ubio_synonyms()
, get_ubioid()
, ubio_ping()
.
In addition, ubio has been removed as an option in the synonyms()
function,
and references for uBio have been removed from the taxize_cite()
utility
function. (#449)rankagg()
doesn't depend on data.table
anymore (fixes issue with CRAN checks)RCurl::base64Decode()
with openssl::base64_decode()
, needed for
ubio_*()
functions (#447)importFrom
) used across all imports now (#446).
In addition, importFrom
for all non-base R pkgs, including graphics
, methods
,
stats
and utils
packages (#441)query
parameter in GET()
, but can pass NULL
(#445)gni_*()
functions, including code tidying, some
DRYing out, and ability to pass in curl options (#444)taxize_cite()
classification()
where numeric IDs as input got
converted to itis ids just because they were numeric. Fixed now. (#434)synonyms
function to get
name synonyms. (#430)apgFamilies
and apgOrders
. (#418)col_search()
gains parameters response
to get a terse or full response, and
...
to pass in curl options.eol_dataobjects()
gains parameter ...
to pass in curl options, and parameter
returntype
renamed to asdf
(for "as data.frame").ncb_get_taxon_summary()
gains parameter ...
to pass in curl options.children()
function gains the rows
parameter passed on to get_*()
functions,
supported for data sources ITIS and Catalogue of Life, but not for NCBI.upstream()
function gains the rows
parameter passed on to get_*()
functions,
supported for both data sources ITIS and Catalogue of Life.classification()
function gains the rows
parameter passed on to get_*()
functions, for all sources used in the function.downstream()
function gains the rows
parameter passed on to get_*()
functions, for all sources used in the function.get_*()
) gain new parameters to
help filter results (e.g., division
, phylum
, class
, family
, parent
, rank
, etc.).
These parameters allow direct matching or regex filters (e.g., .a
to match any character
followed by an a
). (#410) (#385)get_*()
) now give back more
information (mostly higher taxonomic data) to help in the interactive decision
process. (#327)synonyms()
function: Catalogue of Life. (#430)vegan
package, used in class2tree()
function, moved from Imports to Suggests. (#392)taxize_cite()
a lot - get URLs and sometimes citation information
for data sources available in taxize. (#270)apg_lookup()
function. (#422)apg_families()
function. (#418)callopts
parameter in eol_pages()
, eol_search()
, gnr_resolve()
,
tp_accnames()
, tp_dist()
, tp_search()
, tp_summary()
, tp_synonyms()
,
ubio_search()
changed to ...
accepted
parameter in get_tsn()
changed to FALSE
by default. (#425)db
parameter in resolve()
changed to gnr
as tnrs
is
often quite slow.tpl_families()
and tpl_get()
. (#424)ncbi_getbyname()
, ncbi_getbyid()
, ncbi_search()
, eol_invasive()
,
gisd_isinvasive()
. These functions are available in the traits
package. (#382)phylomatic_tree()
is deprecated, but will be defunct in a upcoming version.taxize
. E.g., itis_ping()
pings ITIS and returns a logical, indicating if the ITIS API is working or not. You can also do a very basic test to see whether content returned matches what's expected. (#394)status_codes()
to get vector of HTTP status codes. (#394)itis_ping()
, and all *_ping()
functions.\donttest
into \dontrun
.genbank2uid()
to get a NCBI taxonomic id (i.e., a uid) from a either a GenBank accession number of GI number. (#375)get_nbnid()
to get a UK National Biodiversity Network taxonomic id (i.e., a nbnid). (#332)nbn_classification()
to get a taxonomic classification for a UK National Biodiversity Network taxonomic id. Using this new function, generic method classification()
gains method for nbnid
. (#332)nbn_synonyms()
to get taxonomic synonyms for a UK National Biodiversity Network taxonomic id. Using this new function, generic method synonyms()
gains method for nbnid
. (#332)nbn_search()
to search for taxa in the UK National Biodiversity Network. (#332)ncbi_children()
to get direct taxonomic children for a NCBI taxonomic id. Using this new function, generic method children()
gains method for ncbi
. (#348) (#351) (#354)upstream()
to get taxa upstream of a taxon. E.g., getting families upstream from a genus gets all families within the one level higher up taxonomic class than family. (#343)as.*()
to coerce numeric/alphanumeric codes to taxonomic identifiers for various databases. There are methods on this function for each of itis, ncbi, tropicos, gbif, nbn, bold, col, eol, and ubio. By default as.*()
funtions make a quick check that the identifier is a real one by making a GET request against the identifier URI - this can be toggle off by setting check=FALSE
. There are methods for returning itself, character, numeric, list, and data.frame. In addition, if the as.*.data.frame()
function is used, a generic method exists to coerce the data.frame
back to a identifier object. (#362)get_tsn_()
(the underscore is the only different from the previous function name). These functions don't do the normal interactive process of prompts that e.g., get_tsn()
do, but instead returned a list of all ids, or a subset via the rows
parameter. (#237)ncbi_get_taxon_summary()
to get taxonomic name and rank for 1 or more NCBI uid's. (#348)assertthat
removed from package imports, replaced with stopifnot()
, to reduce dependency load. (#387)eol_hierarchy()
now defunct (no longer available) (#228) (#381)tp_classifcation()
now defunct (no longer available) (#228) (#381)col_classification()
now defunct (no longer available) (#228) (#381)?fxn-name
.get_*()
functions gain a new parameter rows
to allow selection of particular rows. For example, rows=1
to select the first row, or rows=1:3
to select rows 1 through 3. (#347)classification()
now by default returns taxonomic identifiers for each of the names. This can be toggled off by the return_id=FALSE
. (#359) (#360)switch()
on the db
parameter, which helps give better error message when a db
value is not possible or spelled incorrectly. (#379)children()
, which is a single interface to various data sources to get immediate children from a given taxonomic name. (#304)bold_search()
that searches for taxa in the BOLD database of barcode data; get_boldid()
to search for a BOLD taxon identifier. (#301)get_ubioid()
to get a uBio taxon identifier. (#318)taxize
: taxize_cite()
. (#270)jsonlite
instead of RJSONIO
throughout the taxize
.get_ids()
gains new option to search for a uBio ID, in addition to the others, itis, ncbi, eol, col, tropicos, and gbif.stripauthority
parameter gnr_resolve()
. (#325)iplant_resolve()
now outputs data.frame structure instead of a list. (#306)seqrange
in ncbi_getbyname()
and ncbi_search()
(#328)synonyms()
gains new data source, can now get synonyms from uBio data source (#319)vascan_search()
giving back more useful results now.tnrs()
function, including more meaningful error messages on failures (#323) (#331)getpublicationsfromtsn()
that caused function to fail on data.frame's with no data on name assignment (#297)sci2comm()
that caused fxn to fail when using db=itis
sometimes (#293)scrapenames()
. Sending a text blob via the text
parameter now works.resolve()
so that function now works for all 3 data sources. (#337)iplant_resolve()
to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the tnrs()
function.ipni_search()
to search for names in the International Plant Names Index (IPNI).resolve()
that unifies name resolution services from iPlant's name resolution service (via iplant_resolve()
), Taxosaurus' TNRS (via tnrs()
), and GNR's name resolution service (via gnr_resolve()
).get_*()
functions how returning a new uri attribute that is a link to the taxon on on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example: browseURL(attr(result, "uri"))
.get_eolid()
now returns an attribute provider because EOL collates taxonomic data form a lot of sources, then gives back IDs that are internal EOL ids, not those matching the id of the source they pull from. This should help with provenance, and should help if there is confusion about why the id givenb back by this function does not match that from the original source.get_tsn()
function, now using the function itis_terms()
, which gives back the accepted status of the taxa. This allows a new parameter in the function (accepted
, logical) that allows user to say give back only accepted status names (accepted=TRUE
), or to give back all names (accepted=FALSE
).gnr_resolve()
gains two new parameters best_match_only
(logical, to return best match only) and preferred_data_sources
(to return preferred data sources) and callopts
to pass in curl options.tnrs()
, tp_accnames()
, tp_refs()
, tp_summary()
, and tp_synonyms()
gain new parameter callopts
to pass in curl options.class2tree()
can now handle NA in classification objects.classification.eolid()
and classification.colid()
now return the submitted name along with the classification.plyr
functions, see #275.verbose
parameter to many more functions to allow suppression of help messages.httr
, now manually parsing JSON to a list then to another data format instead of allowing internal httr
parsing - in addition added checks on content type and encoding in many functions.match.arg
iternally to get_ids()
for the db
parameter so that a) unique short abbreviations of possible values are possible, and b) gives a meaningful warning if unsupported values are given.getexpertsfromtsn
, getgeographicdivisionsfromtsn
) gain parameter curlopts
to pass in curl options.stringsAsFactors=FALSE
to all data.frame
creations to eliminate factor variables.classification.gbifid()
did not return the correct result when taxon not found.classification()
used to fail when it was passed a subset of a vector of ids, in which case the class information was stripped off. Now works (#284)