A Flexible Container to Transport and Manipulate Data and Associated Resources

Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI-ORE standard is described at < https://www.openarchives.org/ore>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at < https://tools.ietf.org/html/draft-kunze-bagit-08>.


CRAN_Status_Badge

The datapack R package provides an abstraction for collating heterogeneous collections of data objects and metadata into a bundle that can be transported and loaded into a single composite file. The methods in this package provide a convenient way to load data from common repositories such as DataONE into the R environment, and to document, serialize, and save data from R to data repositories worldwide.

Installation Notes

The datapack R package requires the R package redland. If you are installing on Ubuntu then the Redland C libraries must be installed before the redland and datapack package can be installed. If you are installing on Mac OS X or Windows then installing these libraries is not required.

The following instructions illustrate how to install datapack and its requirements.

Installing on Mac OS X

On Mac OS X datapack can be installed with the following commands:

install.packages("datapack")
library(datapack)

The datapack R package should be available for use at this point.

Note: if you wish to build the required redland package from source before installing datapack, please see the redland installation instructions.

Installing on Ubuntu

For ubuntu, install the required Redland C libraries by entering the following commands in a terminal window:

sudo apt-get update
sudo apt-get install librdf0 librdf0-dev

Then install the R packages from the R console:

install.packages("datapack")
library(datapack)

The datapack R package should be available for use at this point

Installing on Windows

For windows, the required redland R package is distributed as a binary release, so it is not necessary to install any additional system libraries.

To install the R packages from the R console:

install.packages("datapack")
library(datapack)

Quick Start

See the full manual for documentation, but once installed, the package can be run in R using:

library(datapack)
help("datapack")

Create a DataPackage and add metadata and data DataObjects to it:

library(datapack)
library(uuid)
dp <- new("DataPackage")
mdFile <- system.file("extdata/sample-eml.xml", package="datapack")
mdId <- paste("urn:uuid:", UUIDgenerate(), sep="")
md <- new("DataObject", id=mdId, format="eml://ecoinformatics.org/eml-2.1.0", file=mdFile)
addData(dp, md)

csvfile <- system.file("extdata/sample-data.csv", package="datapack")
sciId <- paste("urn:uuid:", UUIDgenerate(), sep="")
sciObj <- new("DataObject", id=sciId, format="text/csv", filename=csvfile)
dp <- addData(dp, sciObj)
ids <- getIdentifiers(dp)

Add a relationship to the DataPackage that shows that the metadata describes, or "documents", the science data:

dp <- insertRelationship(dp, subjectID=mdId, objectIDs=sciId)
relations <- getRelationships(dp)

Create an Resource Description Framework representation of the relationships in the package:

serializationId <- paste("resourceMap", UUIDgenerate(), sep="")
filePath <- file.path(sprintf("%s/%s.rdf", tempdir(), serializationId))
status <- serializePackage(dp, filePath, id=serializationId, resolveURI="")

Save the DataPackage to a file, using the BagIt packaging format:

bagitFile <- serializeToBagIt(dp) 

Note that the dataone R package can be used to upload a DataPackage to a DataONE Member Node using the uploadDataPackage method. Please see the documentation for the dataone R package, for example:

vignette("upload-data", package="dataone")

Acknowledgements

Work on this package was supported by:

  • NSF-ABI grant #1262458 to C. Gries, M. B. Jones, and S. Collins.
  • NSF-DATANET grants #0830944 and #1430508 to W. Michener, M. B. Jones, D. Vieglais, S. Allard and P. Cruse
  • NSF DIBBS grant #1443062 to T. Habermann and M. B. Jones
  • NSF-PLR grant #1546024 to M. B. Jones, S. Baker-Yeboah, J. Dozier, M. Schildhauer, and A. Budden

Additional support was provided for working group collaboration by the National Center for Ecological Analysis and Synthesis, a Center funded by the University of California, Santa Barbara, and the State of California.

nceas_footer

ropensci_footer

News

Version 1.3.1

BUGS

  • fixed bug in updateMetadata() that would cause package relationships for the metadata object to be lost.

Version 1.3.0

NEW FEATURES

  • Added support for DataPackage download, edit, upload workflow. (#85)

  • Added new method parseRDF() to parses an RDF/XML resource map from a file. (#85)

  • Added new method removeMember() which removes a member from a Package. (#85)

  • Added new method replaceMember() which replaces the raw data or file associated with a DataObject. (#85)

  • Added new method selectMember(0) which selects package members based on slot values. (#85)

  • Added new method updateRelationships() which updates package relationships by replacing an old identifier with a new one. (#85)

  • Added new method updateMetadata() to update XML content of a DataOBject in a DataPackage. (#85)

  • Added new method getValue() which gets values for selected DataPackage member slots. (#85)

  • Added new method setValue(0) which sets values for selected DataPackage member slots. (#85)

  • Added new method removeAccessRule() to SystemMetadata, DataObject, DataPackage classes. (#78)

  • Added new method hasAccessRule() to DataObject, DataPackage classes. (#78)

  • Added new method clearAccessPolicy() DataObject, DataPackage classes. (#78)

  • Added new method addAccessRule() to DataPackage. class (#85)

  • Added new method setPublicAccess() to DataPackage. class (#85)

  • Access policies can now be modified for DataPackage, DataObject. (#78)

  • Resource map identifiers now include metadata object identifier. (#82)

BUGS

  • fixed bug where resource maps had invalid XML names for blank node identifiers. (#79)

  • fixed bug where resource maps did not include creator or modification time. (#80)

DEPRECATED

  • deprecated function addData(), renamed to addMember().

Version 1.2.0

BUGS

  • Fixed bug where replicationAllowed was not set correctly when parsing if it is false (#61)

  • Fixed bug where numberReplicas was not set correctly when parsing (#63)

  • Fixed bug where the mediaType argument to DataObject initialize() was not being handled correctly and resulted in an invalid system metadata object to be serialized from the DataObject. (#67)

  • Added argument 'mediaTypeProperty' to DataObject initialize() which was needed to fully support 'mediaType'. (#67)

NEW FEATURES

  • Added new function to reset access policies clearAccessPolicy() (#56)

  • Added new function describeWorkflow() to add run provenance relationships to a DataPackage (#64)

  • Added 'Show' methods for DataObject and DataPackage classes. (#71, #73)

DEPRECATED

  • The method recordDerivation is deprecated in this release and may be marked as Defunct and removed in a future release (#68)

Version 1.1.0

This version was not released publically.

Version 1.0.1

BUGS

  • Fixed bug where Roxygen example for serializePackage() was writing to the "/tmp" directory

  • Serializing system metadata to XML with serializeSystemMetadata() now gathers all elements together for a so that the subject does not appear under multiple elements.

Version 1.0.0

NEW FEATURES

  • Initial version (see help topic for 'datapack', e.g. "?datapack")

  • Provides an API for building and serializing packages of data and associated metadata.

  • The package name has been changed from 'datapackage' to 'datapack'

NEW S4 CLASSES

  • Class DataPackage for building and serializing data packages.

  • Class SystemMetadata and DataObject for representing a member of a data package.

  • Class ResourceMap for building and serializing a Resource Description Framework representation of a data package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("datapack")

1.3.1 by Matthew B. Jones, 2 years ago


Report a bug at https://github.com/ropensci/datapack/issues


Browse source code at https://github.com/cran/datapack


Authors: Matthew B. Jones [aut, cre] , Peter Slaughter [aut] , Regents of the University of California [cph]


Documentation:   PDF Manual  


Apache License (== 2.0) license


Imports digest, methods, redland, XML, hash, uuid

Suggests testthat, knitr, httr


Imported by dataone.


See at CRAN