Reads the provenance collected by the 'rdt' or
'rdtLite' packages, or other tools providing compatible PROV JSON output
created by the execution of a script, and provides a human-readable
summary identifying the input and output files, the script used
(if any), errors and warnings produced, and the environment in
which it was executed. It can also optionally package all the
files into a zip file. The exact format of the JSON created by
'rdt' and 'rdtLite' is described in
< https://github.com/End-to-end-provenance/ExtendedProvJson>.
More information about 'rdtLite' and associated tools is available
at < https://github.com/End-to-end-provenance/> and Barbara Lerner,
Emery Boose, and Luis Perez (2018), Using Introspection to Collect
Provenance in R, Informatics,
Reads the provenance created by the execution of a script or from commands in the console and provides a human-readable summary identifying the input and output files, the script used (if any), and the environment in which it was executed. It can also optionally package all the files into a zip file.
provSummarizeR works with provenance collected by the rdt or rdtLite packages.
Install from GitHub:
# install.packages("devtools")devtools::install_github("End-to-end-provenance/provSummarizeR")
Once installed, load the package:
library("provSummarizeR")
The summarize functions can be used in one of three ways.
prov.summarize.run ("script.R")
rdtLite::prov.run ("script.R")prov.summarize ()
prov.summarize.file ("prov_script/prov.json")
All three functions have two optional parameters, save and create.zip.
If save is true, the summary is saved to a file, in addition to being displayed in the console. The file is named prov-summary.txt and is stored in the provenance directory. The default value of save is false.
If create.zip is true, the provenance directory is packaged into a timestamped zip file and placed in the current working directory. This file will contain a copy of all input and output files and scripts used, as well as the prov-summary.txt if save is true. It also include the prov.json file containing the detailed execution trace. The default value of create.zip is false.
Creating the zip file depends on the use of an external zip program. It has been tested with zip for Unix/Mac OS and with 7z on Windows. It may or may not work with other zip programs. To use a program other than zip, set the R_ZIPCMD environment variable.
Here is an example of what the summary looks like. It first identifies the script executed. Next it describes details of how and when the script was executed. It then lists the libraries that were used during execution, any additional scripts sourced, and the input and output files.
PROVENANCE SUMMARY for script.R
ENVIRONMENT:
Executed at 2018-11-29T16.52.34EST
Script last modified at 2018-11-29T16.34.54EST
Executed with R version 3.5.1 (2018-07-02)
Executed on x86_64 running darwin15.6.0
Provenance was collected with rdtLite 1.0.2
Provenance is stored in /Users/blerner/Documents/scripts/prov_script
Hash algorithm is md5
LIBRARIES:
base 3.5.1
datasets 3.5.1
ggplot2 3.0.0
graphics 3.5.1
grDevices 3.5.1
methods 3.5.1
provSummarizeR 1.0
rdtLite 1.0.2
stats 3.5.1
utils 3.5.1
SOURCED SCRIPTS:
None
INPUTS:
File : in.txt
2018-11-29T16.52.35EST
52dbff5d488efed73caf540c9476aa01
File : script2.R
2018-11-29T16.52.35EST
422b85c26655e3192dece05303b58c11
OUTPUTS:
File : out.txt
2018-11-29T16.52.35EST
a4a33d050511356b2108669380684498