A Data-Centered Data Flow Manager

A data manager meant to avoid manual storage/retrieval of data to/from the file system. It builds one (or more) centralized repository where R objects are stored with rich annotations, including corresponding code chunks, and easily searched and retrieved.



output: github_document

Master branch: Travis-CI Build Status Dev branch: Travis-CI Build Status

Repo is a data-centered data flow manager. It allows to store R data files in a central local repository, together with tags, annotations, provenance and dependence information. Any saved object can then be easily located and loaded through the repo interface.

A paper about Repo has been published in BMC Bioinformatics.

Latest news are found in the NEWS.md file of the "Untested" branch.

Repo is developed by Francesco Napolitano alt text

Minimal example

Repository creation in the default folder:

    library(repo)
    rp <- repo_open()

Putting some stuff (it is saved on permanent storage). In this case, just values and names are specified:

    rp$put(Inf, "God")
    rp$put(0, "user")

Putting specifying dependencies:

    rp$put(pi, "The Pi costant", depends="God")
    rp$put(1:10, "r", depends="user")

Getting stuff from the repository on the fly:

    diam <- 2 * rp$get("r")
    circum <- 2 * rp$get("The Pi costant") * rp$get("r")
    area <- rp$get("The Pi costant") * rp$get("r") ^ 2

Putting with verbose descriptions:

    rp$put(diam, "diameters", "These are the diameters", depends = "r")
    rp$put(circum, "circumferences", "These are the circumferences",
           depends = c("The Pi costant", "r"))
    rp$put(area, "areas", "This are the areas",
           depends = c("The Pi costant", "r"))

Repository contents:

    print(rp)
#>              ID Dims Size
#>             God    1 42 B
#>            user    1 40 B
#>  The Pi costant    1 45 B
#>               r   10 60 B
#>       diameters   10 65 B
#>  circumferences   10 94 B
#>           areas   10 93 B
    rp$info()
#> Root:            /tmp/RtmpIjPq16/kZJJAjPdwgCB 
#> Number of items: 7 
#> Total size:      439 B
    rp$info("areas")
#> ID:           areas
#> Description:  This are the areas
#> Tags:         
#> Dimensions:   10
#> Timestamp:    2017-08-04 15:40:03
#> Size on disk: 93 B
#> Provenance:   
#> Attached to:  -
#> Stored in:    sk/nr/zy/sknrzyen718nms80t89timt6fyrc2zvx
#> MD5 checksum: 65b946a5ffd6d1a63572e1ccfe3a9e08
#> URL:          -

Visualizing dependencies:

    rp$dependencies()

plot of chunk depgraph

Development branches

  • Master: stable major releases, usually in sync with lastest CRAN version.

  • Dev: fairly stable minor releases.

  • Untested: unstable, in progress versions. Latest news appear in the "NEWS.md" file of this branch.

Manuals

Besides inline help, two documents are available as introductory material:

Download and Installation

Repo is on CRAN and can be installed from within R as follows:

However, CRAN versions are not updated very often. Latest stable release can be downloaded from Github at https://github.com/franapoli/repo. Repo can then be installed from the downloaded sources as follows:

> install.packages("path-to-downloaded-source", repos=NULL)

devtools users can download and install at once the latest development version from github as follows:

> install_github("franapoli/repo", ref="dev")

News

News

May 4, 2018, v2.1.3

  • Significant speedup of all operations involving a search through the items, including find and print. These are now usable with large repositories (tested with tens of thousands items).

Dic 6, 2017, v2.1.2.2

  • Changed error and warning calls with call.=F

Nov 27, 2017, v2.1.2.1

  • stash is not deprecated anymore, as it is useful to lazydo

  • fixed some stash behaviour, like when put-ing an item with the same name of stash-ed item, which now does not throw an error.

  • lazydo with force=T will not throw an error for existing item, which is now a stash-ed item

  • laxydo now does not take an object of type expression, but an expression directly.

11/18/17, v2.1.2

  • Fixed broken link to remote sample

  • Changed the outputs of repo_check, which can now be suppressedMessages-ed.

08/03/17, v2.1.1 - Major release submitted to CRAN

08/01/17, v2.1.1 - CRAN candidate release

  • Fixed bug in repo_build

  • Deprecated stash: now that put parameters are mostly optional, it is not necessary anymore.

07/25/17, v2.1.0.9001

  • Correcting vignette for next release

  • fixed version numbering (there is no v2.2)

04/15/17, v2.1.0.9000

  • Also the item name in put is now optional. If not provided, the name of the obj variable will be used.

  • The chunk format has been simplified, the close tag is now just }, as follows:

    ## chunk "ChunkName" {
        ## ...
    ## }
  • Some documentation updates.

04/11/17, v2.1.0

New features:

  • Descriptions and tags are no more mandatory
  • Alternative versions of the same chunk can now be defined like this:
    ## chunk "ChunkName#fork1"{
        ## ...
    ## chunk "ChunkName#fork1"}
    ## chunk "ChunkName#fork2"{
        ## ...
    ## chunk "ChunkName#fork2"}
    ## the following sets the active chunk:
    rp$options(chunk="fork")

Outputs from different forks will be stored together in the repo but all operations will refer to the output of the active chunk. This is to be better documented.

Change log:

  • descriptions and tags are no more mandatory also in attach
  • fixed regression in dependencies (wrong plot edges)
  • fixed bug in set, was not working when setting src parameter

03/05/17, v2.0.5.13

  • Added forking

03/04/17, v2.0.5.12

  • Added depends function
  • Added load function
  • Added force parameter to build
  • More documentation updates

02/24/17, v2.0.5.11

  • Runs all checks
  • Documentation updated
  • dependencies now accepts overriding of default visual igraph parameters

02/22/17

  • A paper about Repo has been published in BMC Bioinformatics.
  • Added some testing code.

10/18/16, v2.0.5.8

  • Major code refactoring. Direct call of repo_* function now deprecated.

10/14/16

  • added chunk and build functions and chunk parameter to put. Repo can now associate a specific chunk of code with a resources and rebuild it upon request.
  • added support for special project items and corresponding project function and prj parameter in put. project items store session information automatically when put is called. The info command works differently on project items.
  • Improved aesthetics for the dependency graph through default visual parameters.
  • added testthat support for internal tests
  • added options function to set put default parameters.

10/08/16

  • added buildURL parameter to set. It will add an URL to all items, such that it can be used is the repository is uploaded to a website.

  • get can now be used on attachments, returns file path

  • Fixed bugs in attach not working with attachments

  • Fixed a few bugs with copy when copying multiple items. Now also accepts confirm and replace

  • Added related function to extract items directly or indirectly related to an input item.

  • added internal testthat unit testing

  • A few changes to pies and dependencies. They now support additional parameters (...) to pass to graphics:pie and igraph:plot respectively. pies now merges together all items with size < 5% of the total. dependencies now supports tags to filter nodes to be showed. Also missing documentation for pies has ben added.

05/03/16

  • new attr function in

  • attach now accepts URL parameter

  • 2.0.2 is now the latest stable on CRAN

05/02/16

  • 2.0.2 contains a fix to repo$pull for OS X

  • Documentation updated

  • NOTE: previous change implies that not all items have a "source" field. This does not seem to be a problem. However, older items storing current working directory as source could be affected.

04/28/16

  • now src must be an item (meant to be an attachment containing source code). Documentation updated accordingly.

  • minor updates to vignette

04/27/2016

  • Bug fixes in lazydo and parameter check

04/24/2016

  • passes devtools::check()

  • stash simplified (now stash(x) instead of stash("x"))

  • Fixed bug when replacing entries due to new scheme for naming files.

01/04/2016

  • Bugfixes in lazydo

  • Minor additions to docs

03/20/3026

  • Fixed warnings in cpanel

  • All check() now passed

  • Merged latest fix from the dev branch

  • The NEWS file now used for main news

03/19/2016

  • Further testing

  • vignette updated

  • News moved to NEWS file

  • Minor release

03/18/2016

  • New pull feature now working

  • added lazydo (run expression and cache results)

  • Sources auto-attach suspended (had problems)

12/07/2015

  • Initial Shiny interface

11/19/2015

  • Sources are now auto-attached

  • Added safe-remove of data

  • Added auto-attach of sources

  • Added notes field

10/21/2015

  • Fixed regression in check after moving to relative paths

  • Added check points for reserved tags and warnings for existing tags

  • Improved print reaction when all matching results are hidden

10/19/2015

  • Added Bulk-edit feature.

10/17/2015

  • Added "Maybe-you-were-looking-for" feature to get.

  • Fixed bug with managing new relative paths.

10/12/2015

  • Absolute to relative paths for stored objects. This allows to easily rebase a repository, for example to a remote machine, without affecting the index.

  • Automatic update of old repository entries upon resource loading.

10/08/2015

  • Atomization of item replacement to avoid the possibility of data loss.

Earlier...

  • Added search field in print to match anything in items and the corresponding "find" shortcut method.

  • When using multiple tags now by default they match when at least one matches (OR).

  • Now tags can be matched using OR, AND, NOT or any external logical function.

  • Fixed a bug in print when showing one item.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("repo")

2.1.3 by Francesco Napolitano, a year ago


Browse source code at https://github.com/cran/repo


Authors: Francesco Napolitano <[email protected]>


Documentation:   PDF Manual  


GPL-3 license


Imports digest, tools

Suggests igraph, knitr, shiny, testthat


Suggested by raustats.


See at CRAN