Tools for assessing and diagnosing convergence of
Markov Chain Monte Carlo simulations, as well as for graphically display
results from full MCMC analysis. The package also facilitates the graphical
interpretation of models by providing flexible functions to plot the
results against observed variables, and functions to work with
hierarchical/multilevel batches of parameters
ggmcmc is a tool for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.
To install or update, run:
New stan/rstan versions treat warmup differently and this is solved.
Solved a bug in ggs_caterpillar() when a list is passed.
par_labels argument in ggs() now allows also a tbl_df (in addition to a data frame).
Version 1.0 comes with the publication of the article at the Journal of Statistical Software.
ggmcmc() can produce reports in HTML format, when a file in HTML extension is given as argument. The figures can be in PNG or SVG formats.
'stanreg' and 'brmsfit' objects from packages 'rstanarm' and 'brms', respectively, are supported by ggs().
Improved documentation (HTML vignette) with the latest improvements (greek letters, rstararm/brms packages supported by ggs(), HTML report, acknowledgments).
Normalize plural for all figure titles.
Warning messages from joins are not shows to the user.
stanfit objects correctly treat Parameter names.
Internal improvement of the function that does the custom sort of the parameter names, as well as their labels.
References to the JSS article have been incorporated in all functions that generate plots.
Improved documentation (JSS vignette) to accommodate last editorial changes.
Introduce the possibility to get Greek letters in the plots, as parameter names.
Vignettes are added, improving the package documentation.
ggs_separation() incorporates new arguments: uncertainty_band (to avoid the uncertainty band of the predicted values), show_labels (to add the Parameter as the label of the case in the x-axis.
Solved a bug that passed thick_ci and thin_ci incorrectly in ggs_caterpillar().
Parameters are sorted by their natural order following the numerical order inside the family. This helps ggs_separation() and ggs_rocplot(), as they expect observed values in numerical order.
Solved a bug with the calculation of the expected number of events in ggs_separation().
Added some supressWarnings() to avoid annoying messages when doing some left_join() operations.
ggs_separation now plots the observed cases as proportional to the horizontal space, not as simple lines. This makes visualization clearer and more accurate to reality.
Updated examples for ggs_pairs() to work with newer version of GGally.
GGally is imported instead of depends.
Specify a certain version of tidyr (0.3.1 needed)
Add a check that "Parameter" is a factor, which is not granted with the development version of tidyr as of 04/01/2016.
Solve revdep issue with tidyr-0.3 due to unqualified imports of "%>%, and use explicit calls to dplyr and tidyr in all functions taken from those packages.
Improve the text of the package description.
Add BugReports fields and include another URL.
Added new function ggs_pairs() (thanks to Max Joseph) that produces scatterplot matrices of parameters, helping with posterior correlation. Library GGally is now a dependency.
ggs() is now able to process warmup (burnin) samples from stan, and so the argument "inc_warmup" now does something useful.
ci() does not allow lists of data frames, so change it accordingly in the help file (thanks to Alex Zvoleff).
Solve minor issues with par_labels.
Solve minor issue with ggs_crosscorrelation not showing correctly par_labels (thanks to Juste).
Add a small example to the documentation about the use of the family argument in ggs().
Better specify the origin of dplyr::* functions, to avoid collisions with other packages.
Include data and samples from models in order to facilitate examples
linear: MCMC output from a linear regression model with dummy data.
logit: MCMC output from a logit regression model with dummy data.
radon: MCMC output and other description variables for a hierarchical / multilevel model based on radon example at Gelman & Hill.
Major revision of the code moving plyr to dplyr and reshape2 to tidyr
The main object from ggs() is no more a data frame, but a tbl_df.
Deleted the possibility of parallelization. It is not (yet) implemented in dplyr, but in any case, dplyr is clearly faster than plyr.
New function ci() that calculates credible intervals for a ggs() object.
When using par_labels, not only the columns "Parameter" and "Label" are considered, but also the resulting ggs() object contains other columns present in par_labels and the old Parameter name. This allows mainly ggs_caterpillar() (but also other functions) to use facets or colors, fills, etc.
New argument "simplify" in ggs_traceplot() or "simplify_traceplot" in ggmcmc() that allows to keep only a percentage of random chains, so that the size of the plot (and the time taken) is reduced considerably. But use it with care, because then the chain is no more complete.
Homogenize function names, using underscores "_" everywhere, instead of dots ".", in calc_bin(), gl_unq() and roc_calc().
When selecting a family of parameters, maintain parameter ordering for multipage outputs.
par_labels had a bug that produced wrong reordering of labels, and levels of parameters were incorrectly sorted. Although it is a minor change, the mistake was severe.
Labels in x-axis in crosscorrelation matrix are better aligned and more legible.
Solve scaling issues in Rhat. Introduce a meaningful default.
Better examples for ggs_separation() and ggs_rocplot().
Documentation improved for internal functions.
When the chain is stucked in the same value, the computation of the ar() for the spectral density needed by ggs_geweke() failed. It has been solved
When the range of a parameter is huge the calculation of the bins is approximate, and gave different number of bins. Now ggs_histogram takes care of different number of bins by each variable due to rounding.
ggmcmc() now allows the argument "plot" to select only some plots.
ggs_Rhat() now has scaling between 1 and 1.5 by default, so that perspective on real convergence is present.
ggs_geweke() includes anargument shadow_limit to highlight the -2+2 range
rugs in ggs_density() and ggs_compare_partial() are FALSE by default: rug it is only a visual improvement that really does not add too much information to the plot, but it takes a lot of resources.
Old arguments "param.page" from ggmcmc() and "fully.bayesian" from ggs_rocplot() now use underscore to be systematic with the rest of the package.
Change default label for ggs_histogram().
Rhat label now has a real hat.
Bug in export on ggs_rocplot.
Stan samples are imported without relying on coda.
Add the possibility to use different parameter names than the default ones provided by the MCMC software, using the argument "par_labels" in ggs().
Move from reshape() to reshape2().
geom_histogram() allows to specify a desired number of bins.
ROC plot (ggs_rocplot()) with code originally from Zachary M. Jones.
Separation plot (ggs_separation()) with code originally from Zachary M. Jones.
New plots ggs_ppmean() and ggs_ppsd() for posterior predictive checks.
Make parallel=FALSE the default in ggs().
Consistency in names of the arguments achieved by moving all "." into "_".
rstan is no more a suggested package, because it is not in CRAN, and it is not expected to be there in the near future.
Fix bugs in ggs_crosscorrelation when the number of parameters was 1 or 2.
In ggmcmc, do not plot Potential Scale Reduction Factor when there is a single chain.
Allow NULL as filename for the pdf file in ggmcmc.
argument "labels" in ggs_caterpillar() is renamed to "model_labols" avoid confusion with the parameter names
Adjust the defaults for the caterpillar plots, to remark the thick and thin segments
ggs() can import MCMCpack objects, stanfit objects and csv files from Stan running from the command line, in addition to the previous import of samples from mcmc.list() objects (from the coda package).
New functions ggs_Rhat() and ggs_geweke() that show graphically the results of the Rhat (potential scale reduction factor, by Gelman & Rubin), and the Geweke diagnostic (z-score).
ggs_caterpillar is able to plot against a continuous variable due to the addition of the 'X' argument
ggs_caterpillar() has the ability to plot two models, so that model comparison is easier. Thanks to Zachary M. Jones.
New parameter "family" that allows to select only certain parameters from the ggs object based on a regular expression (i.e., select all "beta" parameters, or all "theta", ones, or "alpha\[1,.\]", etc...)
Traceplot now shows by default the burnin period and takes care of the thinning, used by ggs_traceplot() and ggs_running().
New argument to have absolute colour scale in crosscorrelations.
ggs_caterpillar() is horizontal by default.
Documentation has been improved.
opts() is replaced by theme() as it is deprecated in ggplot-0.9.2.
theme_text() is replaced by element_text() as it is deprecated in ggplot-0.9.2.
opts(title="") is replaced by labs(title="") as it is deprecated in ggplot-0.9.2.