Data Package Manager for R

Create, install, and summarise data packages that follow the Open Knowledge Foundation's Data Package Protocol.


<img src="img/logo.png" align="right" height="80"/ alt="dmpr logo">

Version: 0.1.9 CRAN Version Build Status CRAN Monthly Downloads CRAN Total Downloads

The R package for creating and installing data packages that follow the Open Knowledge Foundation's Data Package Protocol.

dpmr has three core functions:

  • datapackage_init: initialises a new data package from an R data frame and (optionally) a meta data list.

  • datapackage_install: installs a data package either stored locally or remotely, e.g. on GitHub.

  • datapackage_info: reads a data package's metadata (stored in its datapackage.json file) into the R Console and (optionally) as a list.

Examples

Create Data Packages

To initiate a barebones data package in the current working directory called My_Data_Package use:

# Create fake data
A <- B <- C <- sample(1:20, size = 20, replace = TRUE)
ID <- sort(rep('a', 20))
Data <- data.frame(ID, A, B, C)
 
datapackage_init(df = Data, package_name = 'My_Data_Package')

This will create a data package with barebones metadata in a datapackage.json file. You can then edit this by hand.

Alternatively, you can also create a list with the metadata in R and have this included with the data package:

meta_list <- list(name = 'My_Data_Package',
                    title = 'A fake data package',
                    last_updated = Sys.Date(),
                    version = '0.1',
                    license = data.frame(type = 'PDDL-1.0',
                            url = 'http://opendatacommons.org/licenses/pddl/'),
                    sources = data.frame(name = 'Fake',
                            web = 'No URL, its fake.'))
 
datapackage_init(df = Data, meta = meta_list)

Note if you don't include the resources fields in your metadata list, then they will automatically be added. These fields identify the data files' paths and data schema.

Installing Data Packages

Locally

To load a data package called gdp stored in the current working directory use:

gdp_data <- datapackage_install(path = 'gdp/')

From the web

You can install a package stored remotely using its URL. In this example we directly download the gdp data package from GitHub using the URL for its zip file:

URL <- 'https://github.com/datasets/gdp/archive/master.zip'
gdp_data <- datapackage_install(path = URL)

Get Data Package Metadata

Use datapackage_info to read a data package's metadata into R:

# Print information when working directory is a data package
datapackage_info()

To-do for v0.2

  • datapackage_update for updating a data package's data and metadata.

  • Specify data variable descriptions in meta list.

  • Load inline data from the datapackage.json file.

  • Load data from a GitHub repo using GitHub usernames and repos.


Licensed under GPL-3

News

Changes to the package will be documented here

Version 0.1.9

  • output_dir argument for datapackage_init allows the user to specify the directory to save the data package into. Thanks to @scls19fr for the suggestion.

  • More intelligent handling of tbl_df class objects given to df in datapackage_init and non-data.frame/non-tbl_df objects.

Version 0.1.8

  • Improved compliance with the OKFN data package validator. Thank you to Yann-Aël Le Borgne.

Version 0.1.7

  • Now uses the rio package for data import and export.

Verion 0.1.6

  • Use fread from the data.table package for loading data files into R. Faster and more flexible than previous read.csv implementation.

  • Minor documentation improvements for datapackage_info.

Version 0.1.5

  • Improved datapackage_install handling when data package contains multiple data files.

  • Improved message from datapackage_info for included data files.

  • Minor documentation improvements.

Version 0.1.4

  • datapackage_init validates that the user specified metadata lists included required minimum fields.

  • Error handling if source_cleaner paths incorrectly specified in datapackage_init.

  • Added user specified metadata example to datapackage_init.

  • Other documentation improvements.

Version 0.1.3

  • Fixed a bug in datapackage_info where it would crash if a field was missing from datapackage.json.

Version 0.1.2

  • Added source_cleaner_rename to datapackage_init allowing the user to specify whether or not to rename source_cleaner files.

Version 0.1.1

  • datapackage_info function added. This function returns a data package's metadata to the Console and as a list.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("dpmr")

0.1.9 by Christopher Gandrud, 3 years ago


http://cran.r-project.org/package=dpmr


Report a bug at https://github.com/christophergandrud/dpmr/issues


Browse source code at https://github.com/cran/dpmr


Authors: Christopher Gandrud [aut, cre] , Yann-Ael Le Borgne [ctb]


Documentation:   PDF Manual  


GPL-3 license


Imports digest, httr, jsonlite, magrittr, rio

Suggests devtools, testthat


See at CRAN