Simple and Scalable Statistical Modelling in R
Write statistical models in R and fit them by MCMC and optimisation on CPUs and GPUs, using Google 'TensorFlow'.
greta lets you write your own model like in BUGS, JAGS and Stan, except that you write models right in R, it scales well to massive datasets, and it’s easy to extend and build on.
See the website for more information, including tutorials, examples, package documentation, and the greta forum.
greta 0.3.0 (in development)
This is a very large update which adds a number of features and major speed improvements. We now depend on the TensorFlow Probability Python package, and use functionality in that package wherever possible. Sampling a simple model now takes ~10s, rather than ~2m (>10x speedup).
dim<-() now always rearranges elements in column-major order (R-style, not Python-style)
- removed excessive checking of TF installation by operation greta arrays (was slowing down greta array creation for complex models)
- sped up detection of sub-DAGs in model creation (was slowing down model definition for complex models)
- reduced passing between R, Python, and TensorFlow during sampling (was slowing down sampling)
- 18 new optimisers have been added
- initial values can now be passed for some or all parameters
- 2 new MCMC samplers have been added: random-walk Metropolis-Hastings (thanks to @michaelquinn32) and slice sampling
- improved tuning of MCMC during warmup (thanks to @martiningram)
- integration with the
future package for execution of MCMC chains on remote machines. Note: it is not advised to use
future for parallel execution of chains on the same machine, that is now automatically handled by greta.
one_by_one argument to MCMC can handle serious numerical errors (such as failed matrix inversions) as 'bad' samples
extra_samples() function to continue sampling from a model.
calculate() works on the output of MCMC, to enable post-hoc posterior prediction
- multivariate distributions now accept matrices of parameter values
joint() distribution constructors
- added functions:
tapply() (thanks to @jdyen)
- we now automatically skip operations if possible, e.g. computing binomial and poisson densities with log-, logit- or probit-transformed parameters where they exist, or skipping cholesky decomposition of a matrix if it was created from its cholesky factor. This increases numerical stability as well as speed.
- ability to change the colour of the model plot (thanks to @dirmeier)
- ability to reshape greta arrays using
- mcmc now runs 4 chains (simultaneously on all available cores), 1000 warmup steps, and 1000 samples by default
- optimisation and mcmc methods are now passed to
mcmc() as objects, with defined tuning parameters. The
control argument to these functions is now defunct.
- columns names for parameters now give the array indices for each scalar rather than a number (i.e.
x[2, 3], rather than
- multivariate distributions now define each realisation as a row, and parameters must therefore have the same orientation
plot.greta_model() now returns a
DiagrammeR::grViz object (thanks to @flyaflya). This is less modifiable, but renders the plot more much consistently across different environments and notebook types. The
dgr_graph object use to create the
grViz object is included as an attribute of this object, named
- lots more model examples (thanks to @leehazel, @dirmeier, @jdyen)
- two analysis case studies (thanks to @ShirinG, Tiphaine Martin, @mmulvahill, @michaelquinn32, @revodavid)
- new and improved pkgdown website (thanks to @pteetor)
- added tests of the validity of posterior samples drawn by MCMC (for known distributions and with Geweke tests)
Minor patch to handle an API change in the progress package. No changes in functionality.
- improved error checking/messages in
- switched docs and examples to always use
<- for assignment
- fixed the
n_cores argument to
- added a
calculate() function to compute the values of greta arrays conditional on provided values for others
- added a
chains argument to
- improved HMC self-tuning, including a diagonal euclidean metric
- fixed breaking change in extraDistr API (caused test errors on CRAN builds)
- added dontrun statements to pass CRAN checks on winbuilder
- fixed breaking change in tensorflow API (1-based indexing)
dim<-() to reshape greta arrays
sweep() now handles greta array
x is numeric
- export internal functions via
.internals object to enable extension packages
- removed the deprecated
define_model(), an alias for
- removed the dynamics module, to be replaced by the gretaDynamics package