Archive and Unarchive Databases Using Flat Files

Flat text files provide a robust, compressible, and portable way to store tables from databases. This package provides convenient functions for exporting tables from relational database connections into compressed text files and streaming those text files back into a database without requiring the whole table to fit in working memory.


Travis buildstatus Coveragestatus Buildstatus CRAN_Status_Badge lifecycle CRAN RStudio mirrordownloads DOI

The goal of arkdb is to provide a convienent way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much to large to read into memory all at once.

Links

Installation

You can install arkdb from GitHub with:

# install.packages("devtools")
devtools::install_github("cboettig/arkdb")

Basic use

library(arkdb)
 
# additional libraries just for this demo
library(dbplyr)
library(dplyr)
library(fs)

Creating an archive of a database

Consider the nycflights database in SQLite:

tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//Rtmp3oFbR8/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

Create an archive of the database:

dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#>  ...Done! (in 0.00859499 secs)
#> Exporting airports in 50000 line chunks:
#>  ...Done! (in 0.03927302 secs)
#> Exporting flights in 50000 line chunks:
#>  ...Done! (in 27.67571 secs)
#> Exporting planes in 50000 line chunks:
#>  ...Done! (in 0.2452919 secs)
#> Exporting weather in 50000 line chunks:
#>  ...Done! (in 2.398423 secs)

Unarchive

Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:

files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)
 
unark(files, new_db, lines = 50000)
#> Importing airlines.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.04621601 secs)
#> Importing airports.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.146152 secs)
#> Importing flights.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 17.44235 secs)
#> Importing planes.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.06386805 secs)
#> Importing weather.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 1.224544 secs)
 
new_db
#> src:  sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/Rtmp3oFbR8/local.sqlite]
#> tbls: airlines, airports, flights, planes, weather

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer

News

arkdb 0.0.5

  • ark()'s default keep-open method would cut off header names for Postgres connections (due to variation in the behavior of SQL queries with LIMIT 0.) The issue is now resolved by accessing the header in a more robust, general way.

arkdb 0.0.4

  • unark() will strip out non-compliant characters in table names by default.
  • unark() gains the optional argument tablenames, allowing the user to specify the corresponding table names manually, rather than enforcing they correspond with the incoming file names. #18
  • unark() gains the argument encoding, allowing users to directly set the encoding of incoming files. Previously this could only be set by setting options(encoding), which will still work as well. See FAO.R example in examples for an illustration.
  • unark() will now attempt to guess which streaming parser to use (e.g csv or tsv) based on the file extension pattern, rather than defaulting to a tsv parser. (ark() still defaults to exporting in the more portable tsv format).

arkdb 0.0.3 2018-09-11

  • Remove dependency on utils::askYesNo for backward compatibility, #17

arkdb 0.0.2 2018-08-20 (First release to CRAN)

  • Ensure the suggested dependency MonetDBLite is available before running unit test using it.

arkdb 0.0.1 2018-08-20

  • overwrite existing tables of same name (with warning and interactive proceed) in both DB and text-files to avoid appending.

arkdb 0.0.0.9000 2018-08-11

  • Added a NEWS.md file to track changes to the package.
  • Log messages improved as suggested by @richfitz
  • Improved mechanism for windowing in most DBs, from @krlmlr #8
  • Support pluggable I/O, based on @richfitz suggestions #3, #10

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("arkdb")

0.0.5 by Carl Boettiger, 9 months ago


https://github.com/ropensci/arkdb


Report a bug at https://github.com/ropensci/arkdb/issues


Browse source code at https://github.com/cran/arkdb


Authors: Carl Boettiger [aut, cre, cph] , Richard FitzJohn [ctb]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports DBI, progress, tools, utils

Suggests readr, spelling, dplyr, dbplyr, nycflights13, testthat, knitr, covr, fs, rmarkdown, RSQLite, MonetDBLite


See at CRAN