Semantically Rich I/O for the 'NeXML' Format

Provides access to phyloinformatic data in 'NeXML' format. The package should add new functionality to R such as the possibility to manipulate 'NeXML' objects in more various and refined way and compatibility with 'ape' objects.


DOI codecov.io BuildStatus CoverageStatus CRANstatus downloads

  • Maintainer: Carl Boettiger
  • Authors: Carl Boettiger, Scott Chamberlain, Hilmar Lapp, Kseniia Shumelchyk, Rutger Vos
  • License: BSD-3
  • Issues: Bug reports, feature requests, and development discussion.

An extensive and rapidly growing collection of richly annotated phylogenetics data is now available in the NeXML format. NeXML relies on state-of-the-art data exchange technology to provide a format that can be both validated and extended, providing a data quality assurance and adaptability to the future that is lacking in other formats. See Vos et al 2012 for further details on the NeXML format.

How to cite

RNeXML has been published in the following article:

Package for Reading and Writing Richly Annotated Phylogenetic, Character, and Trait Data in R.” Methods in Ecology and Evolution, 7, pp. 352-357. doi:10.1111/2041-210X.12469

Although the published version of the article is paywalled, the source of the manuscript, and a much better rendered PDF, are included in this package (in the manuscripts folder). You can also find it freely available on arXiv.

Getting Started

The latest stable release of RNeXML is on CRAN, and can be installed with the usual install.packages("RNeXML") command. Some of the more specialized functionality described in the Vignettes (such as RDF manipulation) requires additional packages which can be installed using:

install.packages("RNeXML", deps=TRUE, repos=c("https://cran.rstudio.com", "http://packages.ropensci.org"))

which will also install the development version of the RNeXML package. For most common tasks such as shown here, those additional packages are not required. The development version of RNeXML is also available on Github. With the devtools package installed on your system, RNeXML can be installed using:

library(devtools)
install_github("ropensci/RNeXML")
library(RNeXML)

Read in a nexml file into the ape::phylo format:

f <- system.file("examples", "comp_analysis.xml", package="RNeXML")
nexml <- nexml_read(f)
tr <- get_trees(nexml) # or: as(nexml, "phylo")
plot(tr)

Write an ape::phylo tree into the nexml format:

data(bird.orders)
nexml_write(bird.orders, "test.xml")
#> [1] "test.xml"

A key feature of NeXML is the ability to formally validate the construction of the data file against the standard (the lack of such a feature in nexus files had lead to inconsistencies across different software platforms, and some files that cannot be read at all). While it is difficult to make an invalid NeXML file from RNeXML, it never hurts to validate just to be sure:

nexml_validate("test.xml")
#> [1] TRUE

Extract metadata from the NeXML file:

birds <- nexml_read("test.xml")
get_taxa(birds)
#>     otu            label about xsi.type otus
#> 1   ou1 Struthioniformes  #ou1       NA  os1
#> 2   ou2     Tinamiformes  #ou2       NA  os1
#> 3   ou3      Craciformes  #ou3       NA  os1
#> 4   ou4      Galliformes  #ou4       NA  os1
#> 5   ou5     Anseriformes  #ou5       NA  os1
#> 6   ou6    Turniciformes  #ou6       NA  os1
#> 7   ou7       Piciformes  #ou7       NA  os1
#> 8   ou8    Galbuliformes  #ou8       NA  os1
#> 9   ou9   Bucerotiformes  #ou9       NA  os1
#> 10 ou10      Upupiformes #ou10       NA  os1
#> 11 ou11    Trogoniformes #ou11       NA  os1
#> 12 ou12    Coraciiformes #ou12       NA  os1
#> 13 ou13      Coliiformes #ou13       NA  os1
#> 14 ou14     Cuculiformes #ou14       NA  os1
#> 15 ou15   Psittaciformes #ou15       NA  os1
#> 16 ou16      Apodiformes #ou16       NA  os1
#> 17 ou17   Trochiliformes #ou17       NA  os1
#> 18 ou18  Musophagiformes #ou18       NA  os1
#> 19 ou19     Strigiformes #ou19       NA  os1
#> 20 ou20    Columbiformes #ou20       NA  os1
#> 21 ou21       Gruiformes #ou21       NA  os1
#> 22 ou22    Ciconiiformes #ou22       NA  os1
#> 23 ou23    Passeriformes #ou23       NA  os1
get_metadata(birds) 
#>   LiteralMeta                      property   datatype  content
#> 1          m1                    dc:creator xsd:string cboettig
#> 2        <NA>                          <NA>       <NA>     <NA>
#> 3          m3 dcterms:bibliographicCitation xsd:string     <NA>
#>       xsi.type ResourceMeta        rel
#> 1  LiteralMeta         <NA>       <NA>
#> 2 ResourceMeta           m2 cc:license
#> 3  LiteralMeta         <NA>       <NA>
#>                                                href
#> 1                                              <NA>
#> 2 http://creativecommons.org/publicdomain/zero/1.0/
#> 3                                              <NA>

Add basic additional metadata:

  nexml_write(bird.orders, file="meta_example.xml",
              title = "My test title",
              description = "A description of my test",
              creator = "Carl Boettiger <[email protected]>",
              publisher = "unpublished data",
              pubdate = "2012-04-01")
#> [1] "meta_example.xml"

By default, RNeXML adds certain metadata, including the NCBI taxon id numbers for all named taxa. This acts a check on the spelling and definitions of the taxa as well as providing a link to additional metadata about each taxonomic unit described in the dataset.

Advanced annotation

We can also add arbitrary metadata to a NeXML tree by define meta objects:

modified <- meta(property = "prism:modificationDate",
                 content = "2013-10-04")

Advanced use requires specifying the namespace used. Metadata follows the RDFa conventions. Here we indicate the modification date using the prism vocabulary. This namespace is included by default, as it is used for some of the basic metadata shown in the previous example. We can see from this list:

RNeXML:::nexml_namespaces
#>                                              nex 
#>                      "http://www.nexml.org/2009
#>                                              xsi 
#>      "http://www.w3.org/2001/XMLSchema-instance
#>                                              xml 
#>           "http://www.w3.org/XML/1998/namespace
#>                                             cdao 
#>        "http://purl.obolibrary.org/obo/cdao.owl
#>                                              xsd 
#>              "http://www.w3.org/2001/XMLSchema#" 
#>                                               dc 
#>               "http://purl.org/dc/elements/1.1/
#>                                          dcterms 
#>                      "http://purl.org/dc/terms/
#>                                              ter 
#>                      "http://purl.org/dc/terms/
#>                                            prism 
#> "http://prismstandard.org/namespaces/1.2/basic/
#>                                               cc 
#>                 "http://creativecommons.org/ns#" 
#>                                             ncbi 
#>          "http://www.ncbi.nlm.nih.gov/taxonomy#" 
#>                                               tc 
#>  "http://rs.tdwg.org/ontology/voc/TaxonConcept#"

This next block defines a resource (link), described by the rel attribute as a homepage, a term in the foaf vocabulalry. Becuase foaf is not a default namespace, we will have to provide its URL in the full definition below.

website <- meta(href = "http://carlboettiger.info", 
                rel = "foaf:homepage")

Here we create a history node using the skos namespace. We can also add id values to any metadata element to make the element easier to reference externally:

  history <- meta(property = "skos:historyNote", 
                  content = "Mapped from the bird.orders data in the ape package using RNeXML",
                  id = "meta123")

For this kind of richer annotation, it is best to build up our NeXML object sequentially. Frist we will add bird.orders phylogeny to a new phylogenetic object, and then we will add the metadata elements created above to this object. Finally, we will write the object out as an XML file:

  birds <- add_trees(bird.orders)
  birds <- add_meta(meta = list(history, modified, website),
                    namespaces = c(skos = "http://www.w3.org/2004/02/skos/core#",
                                   foaf = "http://xmlns.com/foaf/0.1/"),
                    nexml=birds)
  nexml_write(birds, 
              file = "example.xml")
#> [1] "example.xml"

Taxonomic identifiers

Add taxonomic identifier metadata to the OTU elements:

nex <- add_trees(bird.orders)
nex <- taxize_nexml(nex)

Working with character data

NeXML also provides a standard exchange format for handling character data. The R platform is particularly popular in the context of phylogenetic comparative methods, which consider both a given phylogeny and a set of traits. NeXML provides an ideal tool for handling this metadata.

Extracting character data

We can load the library, parse the NeXML file and extract both the characters and the phylogeny.

library(RNeXML)
nexml <- read.nexml(system.file("examples", "comp_analysis.xml", package="RNeXML"))
traits <- get_characters(nexml)
tree <- get_trees(nexml)

(Note that get_characters would return both discrete and continuous characters together in the same data.frame, but we use get_characters_list to get separate data.frames for the continuous characters block and the discrete characters block).

We can then fire up geiger and fit, say, a Brownian motion model the continuous data and a Markov transition matrix to the discrete states:

library(geiger)
fitContinuous(tree, traits[1], ncores=1)
#> GEIGER-fitted comparative model of continuous data
#>  fitted 'BM' model parameters:
#>  sigsq = 1.166011
#>  z0 = 0.255591
#> 
#>  model summary:
#>  log-likelihood = -20.501183
#>  AIC = 45.002367
#>  AICc = 46.716652
#>  free parameters = 2
#> 
#> Convergence diagnostics:
#>  optimization iterations = 100
#>  failed iterations = 0
#>  frequency of best fit = 1.00
#> 
#>  object summary:
#>  'lik' -- likelihood function
#>  'bnd' -- bounds for likelihood search
#>  'res' -- optimization iteration summary
#>  'opt' -- maximum likelihood parameter estimates
fitDiscrete(tree, traits[2], ncores=1)
#> GEIGER-fitted comparative model of discrete data
#>  fitted Q matrix:
#>                 0           1
#>     0 -0.07308302  0.07308302
#>     1  0.07308302 -0.07308302
#> 
#>  model summary:
#>  log-likelihood = -4.574133
#>  AIC = 11.148266
#>  AICc = 11.648266
#>  free parameters = 1
#> 
#> Convergence diagnostics:
#>  optimization iterations = 100
#>  failed iterations = 0
#>  frequency of best fit = 1.00
#> 
#>  object summary:
#>  'lik' -- likelihood function
#>  'bnd' -- bounds for likelihood search
#>  'res' -- optimization iteration summary
#>  'opt' -- maximum likelihood parameter estimates

ropenscifooter

News

NEWS

For more fine-grained list of changes or to report a bug, consult

v2.2.0

  • Fixes various (previously broken) aspects of handling polymorphic and uncertain states for discrete (non-molecular) and continuous characters, including obtaining a character matrix (#174), ensuring proper column types (#188), and serializing to NeXML (#192).
  • Adds the optional ability to, in addition to the character matrix, obtain a concordantly formatted matrix of state types (standard, polymorphic, uncertain).
  • Fixes loss of certain literal-valued metadata when serializing to NeXML. #193
  • Drops package phylobase as dependency. (Also removes circular dependency chain, because phylobase depends on RNeXML.)

v2.1.2

  • Fix failing checks on CRAN that require a network connection

v2.1.1

  • avoid rdf-based tests on solaris architecture, where suggested package rdflib is not available. (CRAN request.)

v2.1.0 2018-05-05

  • taxize as Suggests only
  • drop rrdf in favor of rdflib
  • drop Sxslt in favor of xslt

v2.0.8 2017-11-17

  • patch for compatibility with upcoming release of testthat

v2.0.7 2016-06-28

  • Bugfixes following release of new dplyr and new tidyr dependencies

v2.0.6 2016-03-07

  • Migrate Additional_repositories to new address for OmegaHat project.

v2.0.5 2015-12-31

  • get_metadata(), get_taxa() now return much richer data.frames instead of named vectors. This is potentially a non-backwards compatible change if scripts use the output of these functions as lists (#129). See updated metadata vignette. This introduces new dependencies dplyr and lazyeval.
  • more robust nexml_read() method for URLs, (#123)
  • Avoid assuming the namespace prefix nex for nexml elements (#51, #124, #126). Includes a fix server-side on the NeXML validator as well.
  • nexml_validate() points to the new validator. (#126)

v2.0.4 2015-10-14

  • Fix compatibilty issue with recent phytools release.

v2.0.3 2015-05-27

  • Upgrade tests to be compatible with newest testthat (0.10.0), bumps testthat dependency version up (#119) thanks @hadley

v2.0.2 2015-05-01

  • Add four new vignettes describing the use of various advanced features in the package: the use of SPARQL queries, advanced use of metadata features, an example of how to extend NeXML with simmap data as the use case, and documentation on the central S4 data structure used in the package.
  • Implements the use of Title Case in the package title, as requested (on several occassions) by the CRAN maintainers.

v2.0.1 2014-12-26

  • Update DESCRIPTION to provide a standard install.packages() compatible repository for rrdf, as per request from the CRAN team.

v2.0.0 2014-12-06

  • add URL and BugReports to Description. #103

  • for consistency with other add_ methods, the nexml object is now the last, not the first, argument to add_basic_meta. As this changes the function API, it could break code that does not explicitly name the arguments, so we release this as 2.0.0

v1.1.3 2014-08-06

Minor bugfix

  • Fixes typo that caused validator to fail when nexml.org couldn't be reached

v1.1.2 2014-07-19

Less aggressive unit-tests

  • nexml_validate now returns NULL if the validation cannot be performed. Unit tests now consider either TRUE or NULL as acceptable.
  • Just skips the uuid unit test if uuid package is not available
  • Documented versioning practice in NEWS
  • Unit tests relying on the Figshare API are not run (without failing) if authentication to figshare server fails
  • Documentation updated to include examples for all functions

v1.1-0 2014-07-18

Initial Release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("RNeXML")

2.2.0 by Carl Boettiger, 3 months ago


https://github.com/ropensci/RNeXML


Report a bug at https://github.com/ropensci/RNeXML/issues


Browse source code at https://github.com/cran/RNeXML


Authors: Carl Boettiger [cre, aut] , Scott Chamberlain [aut] , Hilmar Lapp [aut] , Kseniia Shumelchyk [aut] , Rutger Vos [aut]


Documentation:   PDF Manual  


Task views: Phylogenetics, Especially Comparative Methods


BSD_3_clause + file LICENSE license


Imports XML, plyr, reshape2, httr, uuid, dplyr, lazyeval, tidyr, stringr, xml2

Depends on ape, methods

Suggests spelling, rdflib, geiger, phytools, knitr, rfigshare, knitcitations, testthat, rmarkdown, xslt, taxize, covr


Imported by phylobase.

Suggested by rotl.


See at CRAN