Provides an R interface for the 'VMware Data Stack' running on 'PostgreSQL' or 'Greenplum' databases with parallel and distributed computation ability for big data processing. 'PivotalR' provides an R interface to various database operations on tables or views. These operations are almost the same as the corresponding native R operations. Thus users of R do not need to learn 'SQL' when they operate on objects in the database. It also provides a wrapper for 'Apache MADlib', which is an open-source library for parallel and scalable in-database analytics.
PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment to interact with (Greenplum) Database as well as Apache HAWQ (incubating) and the open-source database PostgreSQL for Big Data analytics. It does so by providing an interface to the operations on tables/views in the database. These operations are almost the same as those of data.frame. Minimal amount of data is transfered between R and the database system. Thus the users of R do not need to learn SQL when they operate on the objects in the database. PivotalR also lets the user to run the functions of the open-source big-data machine learning package Apache MADlib (incubating) directly from R.
An Introduction to PivotalR
vignette("pivotalr") # execute in R console to view the PDF file
To install PivotalR:
Get the latest stable version from CRAN by running
Or try out the latest development version from github by running the following code (Need R >= 3.0.2):
Or download the source tarball directly from here, and then install the tarball
install.packages("pivotalsoftware-PivotalR-xxxx.tar.gz", repos = NULL, type = "source")
where "pivotalsoftware-PivotalR-xxxx.tar.gz" is the name of the package that you have downloaded.
To get started: