R Interface to the DataONE REST API

Provides read and write access to data and metadata from the DataONE network < https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.


dataone: R interface to the DataONE network of data repositories

CRAN_Status_Badge Build Status

Provides read and write access to data and metadata from the DataONE network of data repositories, including the KNB Data Repository, Dryad, and the NSF Arctic Data Center. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository using the global identifier for each data object. Users can also insert and update data objects on repositories that support these methods. For more details, see the vignettes.

Installation Notes

Version 2.0 of the dataone R package removes the dependency on rJava and significantly changes the base API to correspond to the published DataONE API. Previous methods for accessing DataONE will be maintained, but new methods have been added.

The dataone R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed first. If you are installing on Mac OS X or Windows then installing these libraries is not required.

Installing on Mac OS X

On Mac OS X dataone can be installed with the following commands:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Installing on Ubuntu

For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the dataone R packages from the R console:

install.packages("dataone")
library(dataone)

The dataone R package should be available for use at this point.

Quick Start

See the full manual (help dataone) for documentation.

To search the DataONE Federation Member Node Knowledge Network for Biocomplexity (KNB) for a dataset:

library(dataone)
cn <- CNode("PROD")
mn <- getMNode(cn, "urn:node:KNB")
mySearchTerms <- list(q="abstract:salmon+AND+keywords:spawn+AND+keywords:chinook",
                      fl="id,title,dateUploaded,abstract,size",
                      fq="dateUploaded:[2017-06-01T00:00:00.000Z TO 2017-07-01T00:00:00.000Z]",
                      sort="dateUploaded+desc")
result <- query(mn, solrQuery=mySearchTerms, as="data.frame")
result[1,c("id", "title")]
id <- result[1,'id']

The metadata file that describes the located research can be downloaded and viewed in an XML viewer, text editor after being written to disk, or in R via the commands below:

library(XML)
metadata <- rawToChar(getObject(mn, id))
doc = xmlRoot(xmlTreeParse(metadata, asText=TRUE, trim = TRUE, ignoreBlanks = TRUE))
tf <- tempfile()
saveXML(doc, tf)
file.show(tf)

This metadata file describes a data file (CSV) in this data collection (package) that can be obtained using the listed identifier, using the commands:

dataRaw <- getObject(mn, "urn:uuid:49d7a4bc-e4c9-4609-b9a7-9033faf575e0")
dataChar <- rawToChar(dataRaw)
theData <- textConnection(dataChar)
df <- read.csv(theData, stringsAsFactors=FALSE)
df[1,]

Uploading a CSV file to a DataONE Member Node requires user authentication. DataONE user authentication is described in the vignette dataone-federation.

Once the authentication steps have been followed, uploading is done with:

library(datapack)
library(uuid)
d1c <- D1Client("STAGING", "urn:node:mnStageUCSB2")
id <- paste("urn:uuid:", UUIDgenerate(), sep="")
testdf <- data.frame(x=1:10,y=11:20)
csvfile <- paste(tempfile(), ".csv", sep="")
write.csv(testdf, csvfile, row.names=FALSE)
# Build a DataObject containing the csv, and upload it to the Member Node
d1Object <- new("DataObject", id, format="text/csv", filename=csvfile)
uploadDataObject(d1c, d1Object, public=TRUE)

In addition, a collection of science metadata and data can be downloaded with one command, for example:

d1c <- D1Client("PROD", "urn:node:KNB")
pkg <- getDataPackage(d1c, id="urn:uuid:04cd34fd-25d4-447f-ab6e-73a572c5d383", quiet=FALSE)

See the R vignette dataone R Package for more information.

Acknowledgments

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

nceas_footer

ropensci_footer

News

Version 2.1.2

Bug Fixes

  • Improve error handling for services that call DataONE 'resolve' service (#232)
  • Eliminate duplicate entries for package vignettes (#232)

Version 2.1.1

Bug Fixes

  • Resolve temporary directory problem on Windows (#204)
  • Fixed broken links in the 'dataone-overview' vignette (#205)
  • Declare 'xml2' package to resolve CRAN build errors (#218)

New Features and functions

  • Added a destination file path argument to getPackage() (#211)
  • downloadObject() provides an easy way to download files from DataONE to disk (#217)

Version 2.1.0

Bug Fixes

  • Fixed bug where query() was incorrectly converting date results (#174)

  • Fixed bug where query() was returning incorrect results for multi-valued Solr fields (#179)

  • Fixed bug where createObject() was not uploading in-memory objects correctly (#198)

New Features and functions

  • Updated methods to aid in downloading, editing and updating packages in DataONE (#175)

  • Added getMetadataMember() to identify the metadata object for a DataPackage (#175)

  • Updated getPackage() to accept pids for data or metadata object (#178)

  • The resource map for a package now sets the default name (sysmeta.fileName) based on the metadata pid (#195)

Version 2.0.2

Bug Fixes

  • Fixed a problem where the unit tests were failing due to an incompatibility with testthat 1.0.2. All unit tests are now passing with testthat 1.0.2. (#171)

  • uploadDataPackage() now uses the @cn slot to set the value for the default resolveURI (#170)

  • All methods that send a PID to DataONE now property URLencode the PID. (#163)

Version 2.0.1

Bug Fixes

  • The unit tests were dependent on unstable development machines and would fail if these machines were not available, not configured correctly, or did not contain expected content. This dependency has been resolved.

Version 2.0.0

New features and functions

  • Complete rewrite of the package, eliminating all dependencies on Java

  • Support for the DataONE v2 API, as well as the existing v1 API

  • DataONE authentication tokens are supported for any DataONE node that has implemented the DataONE v2 API (https://purl.dataone.org/architecture). Tokens are supported in both the production and test environments.

NEW S4 CLASSES

  • Class CNode - provides methods to search, get and send data to a DataONE Coordinating Node

  • Class MNode - provides methods to search, get and send data to a DataONE Member Node

  • Class D1Client - provides a higher level methods to interact with DataONE Coordinating Nodes and Member Nodes

  • Class AuthenticationManager - provides methods to obtain information about DataONE authentication tokens or certificates

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("dataone")

2.1.2 by Matthew B. Jones, 10 months ago


https://github.com/DataONEorg/rdataone


Report a bug at https://github.com/DataONEorg/rdataone/issues


Browse source code at https://github.com/cran/dataone


Authors: Matthew B. Jones [aut, cre] , Peter Slaughter [aut] , Rob Nahf [aut] , Carl Boettiger [aut] , Chris Jones [aut] , Jordan Read [aut] , Lauren Walker [aut] , Edmund Hart [aut] , Scott Chamberlain [aut] , Regents of the University of California [cph]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


Apache License 2.0 license


Imports XML, hash, httr, methods, stringr, datapack, plyr, parsedate, uuid, base64enc, jsonlite

Suggests knitr, rmarkdown, testthat, digest, openssl, xml2


Imported by nesRdata.


See at CRAN