Extension of 'ggplot2', 'ggstatsplot' creates graphics with details from statistical tests included in the plots themselves. It is targeted primarily at behavioral sciences community to provide a one-line code to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports only the most common types of statistical tests: parametric, nonparametric, robust, and bayesian versions of t-test/anova, correlation analyses, contingency table analysis, and regression analyses.
Package | Status | Usage | GitHub | References |
---|---|---|---|---|
ggstatsplot
is an
extension of ggplot2
package
for creating graphics with details from statistical tests included in
the plots themselves and targeted primarily at behavioral sciences
community to provide a one-line code to produce information-rich plots.
In a typical exploratory data analysis workflow, data visualization and
statistical modeling are two different phases: visualization informs
modeling, and modeling in its turn can suggest a different visualization
method, and so on and so forth. The central idea of ggstatsplot is
simple: combine these two phases into one in the form of graphics with
statistical details, which makes data exploration simpler and faster.
Currently, it supports only the most common types of statistical tests: parametric, nonparametric, robust, and bayesian versions of t-test/anova, correlation analyses, contingency table analysis, and regression analyses.
It, therefore, produces a limited kinds of plots for the supported analyses:
In addition to these basic plots, ggstatsplot
also provides
grouped_
versions for most functions that makes it easy to repeat
the same analysis for any grouping variable.
Future versions will include other types of statistical analyses and plots as well.
For all statistical tests reported in the plots, the default template abides by the APA gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):
The table below summarizes all the different types of analyses currently supported in this package-
Functions | Description | Parametric | Non-parametric | Robust | Bayes Factor |
---|---|---|---|---|---|
ggbetweenstats |
Between group/condition comparisons | Yes | Yes | Yes | Yes |
gghistostats , ggdotplotstats |
Distribution of a numeric variable | Yes | Yes | Yes | Yes |
ggcorrmat |
Correlation matrix | Yes | Yes | Yes | No |
ggscatterstats |
Correlation between two variables | Yes | Yes | Yes | Yes |
ggpiestats , ggbarstats |
Association between categorical variables | Yes | No | No | Yes |
ggpiestats |
Proportion test | No | No | No | No |
ggcoefstats |
Regression model coefficients | Yes | No | Yes | Yes |
ggstatsplot
provides a wide range of effect sizes and their confidence
intervals.
Test | Parametric | Non-parametric | Robust | Bayes |
---|---|---|---|---|
one-sample t-test | Yes | Yes | Yes | No |
two-sample t-test (between) | Yes | Yes | Yes | No |
two-sample t-test (within) | Yes | Yes | Yes | No |
One-way ANOVA (between) | Yes | Yes | Yes | No |
One-way ANOVA (within) | Yes | No | No | No |
correlations | Yes | Yes | Yes | No |
contingency table | Yes | NA |
NA |
No |
goodness of fit | Yes | NA |
NA |
No |
regression | Yes | Yes | Yes | No |
To get the latest, stable CRAN release (0.0.10
):
utils::install.packages(pkgs = "ggstatsplot")
Note: If you are on a linux machine, you will need to have OpenGL
libraries installed (specifically, libx11
, mesa
and Mesa OpenGL
Utility library - glu
) for the dependency package rgl
to work.
You can get the development version of the package from GitHub
(0.0.10.9000
). To see what new changes (and bug fixes) have been made
to the package since the last release on CRAN
, you can check the
detailed log of changes here:
https://indrajeetpatil.github.io/ggstatsplot/news/index.html
If you are in hurry and want to reduce the time of installation, prefer-
# needed package to download from GitHub repoutils::install.packages(pkgs = "devtools")# downloading the package from GitHubdevtools::install_github(repo = "IndrajeetPatil/ggstatsplot", # package path on GitHubdependencies = FALSE, # assumes you have already installed needed packagesquick = TRUE # skips docs, demos, and vignettes)
If time is not a constraint-
devtools::install_github(repo = "IndrajeetPatil/ggstatsplot", # package path on GitHubdependencies = TRUE, # installs packages which ggstatsplot depends onupgrade_dependencies = TRUE # updates any out of date dependencies)
If you are not using the RStudio IDE and you
get an error related to “pandoc” you will either need to remove the
argument build_vignettes = TRUE
(to avoid building the vignettes) or
install pandoc. If you have the rmarkdown
R
package installed then you can check if you have pandoc by running the
following in R:
rmarkdown::pandoc_available()#> [1] TRUE
If you want to cite this package in a scientific journal or in any other
context, run the following code in your R
console:
utils::citation(package = "ggstatsplot")
There is currently a publication in preparation corresponding to this package and the citation will be updated once it’s published.
To see the detailed documentation for each function in the stable CRAN version of the package, see:
To see the documentation relevant for the development version of the
package, see the dedicated website for ggstatplot
, which is updated
after every new commit: https://indrajeetpatil.github.io/ggstatsplot/.
In R
, documentation for any function can be accessed with the standard
help
command (e.g., ?ggbetweenstats
).
Another handy tool to see arguments to any of the functions is args
.
For example-
args(name = ggstatsplot::specify_decimal_p)#> Registered S3 method overwritten by 'broom.mixed':#> method from#> tidy.gamlss broom#> Registered S3 methods overwritten by 'lme4':#> method from#> cooks.distance.influence.merMod car#> influence.merMod car#> dfbeta.influence.merMod car#> dfbetas.influence.merMod car#> function (x, k = 3, p.value = FALSE)#> NULL
In case you want to look at the function body for any of the functions, just type the name of the function without the parentheses:
# function to convert class of any object to `ggplot` classggstatsplot::ggplot_converter#> function(plot) {#> # convert the saved plot#> p <- cowplot::ggdraw() +#> cowplot::draw_grob(grid::grobTree(plot))#>#> # returning the converted plot#> return(p)#> }#> <bytecode: 0x000000002df3e8d8>#> <environment: namespace:ggstatsplot>
If you are not familiar either with what the namespace ::
does or how
to use pipe operator %>%
, something this package and its documentation
relies a lot on, you can check out these links-
ggstatsplot
relies on non-standard evaluation (NSE), i.e., rather than
looking at the values of arguments (x
, y
), it instead looks at their
expressions. This means that you shouldn’t enter arguments with the
$
operator and setting data = NULL
: data = NULL, x = data$x, y = data$y
. You must always specify the data
argument for all
functions. On the plus side, you can enter arguments either as a string
(x = "x", y = "y"
) or as a bare expression (x = x, y = y
) and it
wouldn’t matter. To read more about NSE, see-
http://adv-r.had.co.nz/Computing-on-the-language.html
ggstatsplot
is a very chatty package and will by default print helpful
notes on assumptions about linear models, warnings, etc. If you don’t
want your console to be cluttered with such messages, they can be turned
off by setting argument messages = FALSE
in the function call.
Here are examples of the main functions currently supported in
ggstatsplot
.
Note: If you are reading this on GitHub repository, the documentation below is for the development version of the package. So you may see some features available here that are not currently present in the stable version of this package on CRAN. For documentation relevant for the CRAN version, see:
ggbetweenstats
This function creates either a violin plot, a box plot, or a mix of two for between-group or between-condition comparisons with results from statistical tests in the subtitle. The simplest function call looks like this-
# loading needed librarieslibrary(ggstatsplot)# for reproducibilityset.seed(123)# plotggstatsplot::ggbetweenstats(data = iris,x = Species,y = Sepal.Length,messages = FALSE) + # further modification outside of ggstatsplotggplot2::coord_cartesian(ylim = c(3, 8)) +ggplot2::scale_y_continuous(breaks = seq(3, 8, by = 1))
Note that this function returns a ggplot2
object and thus any of the
graphics layers can be further modified.
The type
(of test) argument also accepts the following abbreviations:
"p"
(for parametric) or "np"
(for nonparametric) or "r"
(for
robust) or "bf"
(for Bayes Factor). Additionally, the type of plot
to be displayed can also be modified ("box"
, "violin"
, or
"boxviolin"
).
A number of other arguments can be specified to make this plot even more informative or change some of the default options.
library(ggplot2)# for reproducibilityset.seed(123)# let's leave out one of the factor levels and see if instead of anova, a t-test will be runiris2 <- dplyr::filter(.data = iris, Species != "setosa")# let's change the levels of our factors, a common routine in data analysis# pipeline, to see if this function respects the new factor levelsiris2$Species <-base::factor(x = iris2$Species,levels = c("virginica" , "versicolor"))# plotggstatsplot::ggbetweenstats(data = iris2,x = Species,y = Sepal.Length,notch = TRUE, # show notched box plotmean.plotting = TRUE, # whether mean for each group is to be displayedmean.ci = TRUE, # whether to display confidence interval for meansmean.label.size = 2.5, # size of the label for meantype = "p", # which type of test is to be runbf.message = TRUE, # add a message with bayes factor favoring nullk = 3, # number of decimal places for statistical resultsoutlier.tagging = TRUE, # whether outliers need to be taggedoutlier.label = Sepal.Width, # variable to be used for the outlier tagoutlier.label.color = "darkgreen", # changing the color for the text labelxlab = "Type of Species", # label for the x-axis variableylab = "Attribute: Sepal Length", # label for the y-axis variabletitle = "Dataset: Iris flower data set", # title text for the plotggtheme = ggthemes::theme_fivethirtyeight(), # choosing a different themeggstatsplot.layer = FALSE, # turn off ggstatsplot theme layerpackage = "wesanderson", # package from which color palette is to be takenpalette = "Darjeeling1", # choosing a different color palettemessages = FALSE)
In case of a parametric t-test, setting bf.message = TRUE
will also
attach results from Bayesian Student’s t-test. That way, if the null
hypothesis can’t be rejected with the NHST approach, the Bayesian
approach can help index evidence in favor of the null hypothesis (i.e.,
BF01
).
By default, Bayes Factor quantifies the support for the alternative
hypothesis (H1) over the null hypothesis (H0) (i.e., BF10
is
displayed). Natural logarithms are shown because BF values can be pretty
large. This also makes it easy to compare evidence in favor alternative
(BF10
) versus null (BF01
) hypotheses (since log(BF10) = - log(BF01)
).
Additionally, there is also a grouped_
variant of this function that
makes it easy to repeat the same operation across a single grouping
variable:
# for reproducibilityset.seed(123)# plotggstatsplot::grouped_ggbetweenstats(data = dplyr::filter(.data = ggstatsplot::movies_long,genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")),x = mpaa,y = length,grouping.var = genre, # grouping variablepairwise.comparisons = TRUE, # display significant pairwise comparisonspairwise.annotation = "p.value", # how do you want to annotate the pairwise comparisonsp.adjust.method = "bonferroni", # method for adjusting p-values for multiple comparisonsbf.message = TRUE, # display Bayes Factor in favor of the null hypothesisconf.level = 0.99, # changing confidence level to 99%ggplot.component = list( # adding new components to `ggstatsplot` defaultggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),k = 3,title.prefix = "Movie genre",caption = substitute(paste(italic("Source"),":IMDb (Internet Movie Database)")),palette = "default_jama",package = "ggsci",messages = FALSE,nrow = 2,title.text = "Differences in movie length by mpaa ratings for different genres")
Here is a summary of pairwise comparison tests supported in ggbetweenstats-
Type | Design | Equal variance? | Test | p-value adjustment? |
---|---|---|---|---|
Parametric | between | No | Games-Howell test | Yes |
Parametric | between | Yes | Student’s t-test | Yes |
Parametric | within | NA |
Student’s t-test | Yes |
Non-parametric | between | No | Dwass-Steel-Crichtlow-Fligner test | Yes |
Non-parametric | within | No | Durbin-Conover test | Yes |
Robust | between | No | Yuen’s trimmed means test | Yes |
Robust | within | NA |
Yuen’s trimmed means test | Yes |
Bayes Factor | between | No | No | No |
Bayes Factor | between | Yes | No | No |
Bayes Factor | within | NA |
No | No |
For more, see the ggbetweenstats
vignette:
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html
This function is not appropriate for within-subjects designs.
Variant of this function ggwithinstats
is currently under work. You
can still use this function just to prepare the plot for
exploratory data analysis, but the statistical details displayed in the
subtitle will be incorrect. You can remove them by adding + ggplot2::labs(subtitle = NULL)
to your function call.
As a temporary solution, you can use the helper function from
ggstatsplot
to display results from within-subjects version of the
test in question. Here is an example-
# for reproducibilityset.seed(123)# getting text results using a helper functioncustom_subtitle <-ggstatsplot::subtitle_t_parametric(data = ggstatsplot::iris_long,x = attribute,y = value,paired = TRUE)# displaying the subtitle on the plotggstatsplot::ggbetweenstats(data = ggstatsplot::iris_long,x = attribute,y = value,title = "repeated measures design",results.subtitle = FALSE, # turn off the default subtitlesubtitle = custom_subtitle, # add the custom subtitle prepared using helper functionmessages = FALSE)
ggscatterstats
This function creates a scatterplot with marginal
histograms/boxplots/density/violin/densigram plots from
ggExtra::ggMarginal
and results from statistical tests in the
subtitle:
ggstatsplot::ggscatterstats(data = ggplot2::msleep,x = sleep_rem,y = awake,xlab = "REM sleep (in hours)",ylab = "Amount of time spent awake (in hours)",title = "Understanding mammalian sleep",bf.message = TRUE,messages = FALSE)
Number of other arguments can be specified to modify this basic plot-
# for reproducibilityset.seed(123)# plotggstatsplot::ggscatterstats(data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),x = budget,y = rating,type = "robust", # type of test that needs to be runconf.level = 0.99, # confidence levelxlab = "Movie budget (in million/ US$)", # label for x axisylab = "IMDB rating", # label for y axislabel.var = "title", # variable for labeling data pointslabel.expression = "rating < 5 & budget > 150", # expression that decides which points to labelline.color = "yellow", # changing regression line color linetitle = "Movie budget and IMDB rating (action)",# title text for the plotcaption = expression( # caption text for the plotpaste(italic("Note"), ": IMDB stands for Internet Movie DataBase")),ggtheme = hrbrthemes::theme_ipsum_ps(), # choosing a different themeggstatsplot.layer = FALSE, # turn off ggstatsplot theme layermarginal.type = "density", # type of marginal distribution to be displayedxfill = "#0072B2", # color fill for x-axis marginal distributionyfill = "#009E73", # color fill for y-axis marginal distributionxalpha = 0.6, # transparency for x-axis marginal distributionyalpha = 0.6, # transparency for y-axis marginal distributioncentrality.para = "median", # central tendency lines to be displayedpoint.width.jitter = 0.2, # amount of horizontal jitter for data pointspoint.height.jitter = 0.4, # amount of vertical jitter for data pointsmessages = FALSE # turn off messages and notes)
Additionally, there is also a grouped_
variant of this function that
makes it easy to repeat the same operation across a single grouping
variable:
# for reproducibilityset.seed(123)# plotggstatsplot::grouped_ggscatterstats(data = dplyr::filter(.data = ggstatsplot::movies_long,genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")),x = rating,y = length,bf.message = TRUE, # display bayes factor messageconf.level = 0.99,k = 3, # no. of decimal places in the resultsxfill = "#E69F00",yfill = "#8b3058",xlab = "IMDB rating",grouping.var = genre, # grouping variabletitle.prefix = "Movie genre",ggtheme = ggplot2::theme_grey(),ggplot.component = list(ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))),messages = FALSE,nrow = 2,ncol = 2,title.text = "Relationship between movie length by IMDB ratings for different genres")
For more, see the ggscatterstats
vignette:
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html
ggpiestats
This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s test for within-subjects design) included in the subtitle of the plot. If only one categorical variable is entered, results from one-sample proportion test will be displayed as a subtitle.
# for reproducibilityset.seed(123)# plotggstatsplot::ggpiestats(data = ggplot2::msleep,main = vore,title = "Composition of vore types among mammals",messages = FALSE)
This function can also be used to study an interaction between two
categorical variables. Additionally, this basic plot can further be
modified with additional arguments and the function returns a ggplot2
object that can further be modified with ggplot2
syntax:
# for reproducibilityset.seed(123)# plotggstatsplot::ggpiestats(data = mtcars,main = am,condition = cyl,conf.level = 0.99, # confidence interval for effect size measuretitle = "Dataset: Motor Trend Car Road Tests", # title for the plotstat.title = "interaction: ", # title for the resultsbf.message = TRUE, # display bayes factor in favor of nulllegend.title = "Transmission", # title for the legendfactor.levels = c("1 = manual", "0 = automatic"), # renaming the factor level names (`main`)facet.wrap.name = "No. of cylinders", # name for the facetting variableslice.label = "counts", # show counts data instead of percentagespackage = "ggsci", # package from which color palette is to be takenpalette = "default_jama", # choosing a different color palettecaption = substitute( # text for the captionpaste(italic("Source"), ": 1974 Motor Trend US magazine")),messages = FALSE # turn off messages and notes)
In case of within-subjects designs, setting paired = TRUE
will produce
results from McNemar test-
# for reproducibilityset.seed(123)# datasurvey.data <- data.frame(`1st survey` = c('Approve', 'Approve', 'Disapprove', 'Disapprove'),`2nd survey` = c('Approve', 'Disapprove', 'Approve', 'Disapprove'),`Counts` = c(794, 150, 86, 570),check.names = FALSE)# plotggstatsplot::ggpiestats(data = survey.data,main = `1st survey`,condition = `2nd survey`,counts = Counts,paired = TRUE, # within-subjects designconf.level = 0.99, # confidence interval for effect size measurestat.title = "McNemar Test: ",package = "wesanderson",palette = "Royal1")#> Note: Results from one-sample proportion tests for each#> level of the variable 2nd survey testing for equal#> proportions of the variable 1st survey.#> # A tibble: 2 x 7#> condition Approve Disapprove `Chi-squared` df `p-value` significance#> <fct> <chr> <chr> <dbl> <dbl> <dbl> <chr>#> 1 Approve 90.23% 9.77% 570. 1 0 ***#> 2 Disapprove 20.83% 79.17% 245 1 0 ***#> Note: 99% CI for effect size estimate was computed with 100 bootstrap samples.
Additionally, there is also a grouped_
variant of this function that
makes it easy to repeat the same operation across a single grouping
variable:
# for reproducibilityset.seed(123)# plotggstatsplot::grouped_ggpiestats(data = ggstatsplot::movies_long,main = mpaa,grouping.var = genre, # grouping variabletitle.prefix = "Movie genre", # prefix for the facetted titlelabel.text.size = 3, # text size for slice labelsslice.label = "both", # show both counts and percentage dataperc.k = 1, # no. of decimal places for percentagespalette = "brightPastel",package = "quickpalette",messages = FALSE,nrow = 2,ncol = 2,title.text = "Composition of MPAA ratings for different genres")
For more, including information about the variant of this function
grouped_ggpiestats
, see the ggpiestats
vignette:
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggpiestats.html
ggbarstats
In case you are not a fan of pie charts (for very good reasons), you can
alternatively use ggbarstats
function-
# for reproducibilityset.seed(123)# plotggstatsplot::ggbarstats(data = ggstatsplot::movies_long,main = mpaa,condition = genre,bf.message = TRUE,sampling.plan = "jointMulti",title = "MPAA Ratings by Genre",xlab = "movie genre",perc.k = 1,x.axis.orientation = "slant",ggtheme = hrbrthemes::theme_modern_rc(),ggstatsplot.layer = FALSE,ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),palette = "Set2",messages = FALSE)
And, needless to say, there is also a grouped_
variant of this
function-
# setuplibrary(ggstatsplot)set.seed(123)# let's create a smaller dataframediamonds_short <- ggplot2::diamonds %>%dplyr::filter(.data = ., cut %in% c("Very Good", "Ideal")) %>%dplyr::filter(.data = ., clarity %in% c("SI1", "SI2", "VS1", "VS2", "VVS1")) %>%dplyr::sample_frac(tbl = ., size = 0.05)# plotggstatsplot::grouped_ggbarstats(data = diamonds_short,main = color,condition = clarity,grouping.var = cut,bf.message = TRUE,sampling.plan = "poisson",title.prefix = "Quality",data.label = "both",label.text.size = 3,perc.k = 1,package = "palettetown",palette = "charizard",ggtheme = ggthemes::theme_tufte(base_size = 12),ggstatsplot.layer = FALSE,messages = FALSE,title.text = "Diamond quality and color combination",nrow = 2)
gghistostats
In case you would like to see the distribution of one variable and check if it is significantly different from a specified value with a one sample test, this function will let you do that.
The type
(of test) argument also accepts the following abbreviations:
"p"
(for parametric) or "np"
(for nonparametric) or "r"
(for
robust) or "bf"
(for Bayes Factor).
ggstatsplot::gghistostats(data = ToothGrowth, # dataframe from which variable is to be takenx = len, # numeric variable whose distribution is of interesttitle = "Distribution of Sepal.Length", # title for the plotfill.gradient = TRUE, # use color gradienttest.value = 10, # the comparison value for t-testtest.value.line = TRUE, # display a vertical line at test valuetype = "bf", # bayes factor for one sample t-testbf.prior = 0.8, # prior width for calculating the bayes factormessages = FALSE # turn off the messages)
The aesthetic defaults can be easily modified-
# for reproducibilityset.seed(123)# plotggstatsplot::gghistostats(data = iris, # dataframe from which variable is to be takenx = Sepal.Length, # numeric variable whose distribution is of interesttitle = "Distribution of Iris sepal length", # title for the plotcaption = substitute(paste(italic("Source:", "Ronald Fisher's Iris data set"))),type = "parametric", # one sample t-testconf.level = 0.99, # changing confidence level for effect sizebar.measure = "mix", # what does the bar length denotetest.value = 5, # default value is 0test.value.line = TRUE, # display a vertical line at test valuetest.value.color = "#0072B2", # color for the line for test valuecentrality.para = "mean", # which measure of central tendency is to be plottedcentrality.color = "darkred", # decides color for central tendency linebinwidth = 0.10, # binwidth value (experiment)bf.message = TRUE, # display bayes factor for null over alternativebf.prior = 0.8, # prior width for computing bayes factormessages = FALSE, # turn off the messagesggtheme = hrbrthemes::theme_ipsum_tw(), # choosing a different themeggstatsplot.layer = FALSE # turn off ggstatsplot theme layer)
As can be seen from the plot, bayes factor can be attached (bf.message = TRUE
) to assess evidence in favor of the null hypothesis.
Additionally, there is also a grouped_
variant of this function that
makes it easy to repeat the same operation across a single grouping
variable:
# for reproducibilityset.seed(123)# plotggstatsplot::grouped_gghistostats(data = dplyr::filter(.data = ggstatsplot::movies_long,genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")),x = budget,xlab = "Movies budget (in million US$)",type = "robust", # use robust location measuregrouping.var = genre, # grouping variablenormal.curve = TRUE, # superimpose a normal distribution curvenormal.curve.color = "red",title.prefix = "Movie genre",ggtheme = ggthemes::theme_tufte(),ggplot.component = list( # modify the defaults from `ggstatsplot` for each plotggplot2::scale_x_continuous(breaks = seq(0, 200, 50), limits = (c(0, 200)))),messages = FALSE,nrow = 2,title.text = "Movies budgets for different genres")
For more, including information about the variant of this function
grouped_gghistostats
, see the gghistostats
vignette:
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/gghistostats.html
ggdotplotstats
This function is similar to gghistostats
, but is intended to be used
when numeric variable also has a label.
# for reproducibilityset.seed(123)# plotggdotplotstats(data = dplyr::filter(.data = gapminder::gapminder, continent == "Asia"),y = country,x = lifeExp,test.value = 55,test.value.line = TRUE,test.line.labeller = TRUE,test.value.color = "red",centrality.para = "median",centrality.k = 0,title = "Distribution of life expectancy in Asian continent",xlab = "Life expectancy",bf.message = TRUE,messages = FALSE,caption = substitute(paste(italic("Source"),": Gapminder dataset from https://www.gapminder.org/")))
As with the rest of the functions in this package, there is also a
grouped_
variant of this function to facilitateto repeat the same
operation across a grouping variable.
# for reproducibilityset.seed(123)# removing factor level with very few no. of observationsdf <- dplyr::filter(.data = ggplot2::mpg, cyl %in% c("4", "6"))# plotggstatsplot::grouped_ggdotplotstats(data = df,x = cty,y = manufacturer,xlab = "city miles per gallon",ylab = "car manufacturer",type = "np", # non-parametric testgrouping.var = cyl, # grouping variabletest.value = 15.5,title.prefix = "cylinder count",point.color = "red",point.size = 5,point.shape = 13,test.value.line = TRUE,ggtheme = ggthemes::theme_par(),messages = FALSE,title.text = "Fuel economy data")
ggcorrmat
ggcorrmat
makes a correlalogram (a matrix of correlation coefficients)
with minimal amount of code. Just sticking to the defaults itself
produces publication-ready correlation matrices. But, for the sake of
exploring the available options, let’s change some of the defaults. For
example, multiple aesthetics-related arguments can be modified to change
the appearance of the correlation matrix.
# for reproducibilityset.seed(123)# as a default this function outputs a correlalogram plotggstatsplot::ggcorrmat(data = ggplot2::msleep,corr.method = "robust", # correlation methodsig.level = 0.001, # threshold of significancep.adjust.method = "holm", # p-value adjustment method for multiple comparisonscor.vars = c(sleep_rem, awake:bodywt), # a range of variables can be selectedcor.vars.names = c("REM sleep", # variable names"time awake","brain weight","body weight"),matrix.type = "upper", # type of visualization matrixcolors = c("#B2182B", "white", "#4D4D4D"),title = "Correlalogram for mammals sleep dataset",subtitle = "sleep units: hours; weight units: kilograms")
Note that if there are NA
s present in the selected dataframe, the
legend will display minimum, median, and maximum number of pairs used
for correlation matrices.
Alternatively, you can use it just to get the correlation matrices and
their corresponding p-values (in a tibble
format). Also, note that
if cor.vars
are not specified, all numeric variables will be used.
# for reproducibilityset.seed(123)# show four digits in a tibbleoptions(pillar.sigfig = 4)# getting the correlation coefficient matrixggstatsplot::ggcorrmat(data = iris, # all numeric variables from data will be usedcorr.method = "robust",output = "correlations", # specifying the needed output ("r" or "corr" will also work)digits = 3 # number of digits to be dispayed for correlation coefficient)#> # A tibble: 4 x 5#> variable Sepal.Length Sepal.Width Petal.Length Petal.Width#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 Sepal.Length 1 -0.143 0.878 0.837#> 2 Sepal.Width -0.143 1 -0.426 -0.373#> 3 Petal.Length 0.878 -0.426 1 0.966#> 4 Petal.Width 0.837 -0.373 0.966 1# getting the p-value matrixggstatsplot::ggcorrmat(data = ggplot2::msleep,cor.vars = sleep_total:bodywt,corr.method = "robust",output = "p.values", # only "p" or "p-values" will also workp.adjust.method = "holm")#> # A tibble: 6 x 7#> variable sleep_total sleep_rem sleep_cycle awake brainwt bodywt#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 sleep_to~ 0. 5.291e-12 9.138e- 3 0. 3.170e- 5 2.568e- 6#> 2 sleep_rem 4.070e-13 0. 1.978e- 2 5.291e-12 9.698e- 3 3.762e- 3#> 3 sleep_cy~ 2.285e- 3 1.978e- 2 0. 9.138e- 3 1.637e- 9 1.696e- 5#> 4 awake 0. 4.070e-13 2.285e- 3 0. 3.170e- 5 2.568e- 6#> 5 brainwt 4.528e- 6 4.849e- 3 1.488e-10 4.528e- 6 0. 4.509e-17#> 6 bodywt 2.568e- 7 7.524e- 4 2.120e- 6 2.568e- 7 3.221e-18 0.# getting the confidence intervals for correlationsggstatsplot::ggcorrmat(data = ggplot2::msleep,cor.vars = sleep_total:bodywt,corr.method = "kendall",output = "ci",p.adjust.method = "holm")#> Note: In the correlation matrix,#> the upper triangle: p-values adjusted for multiple comparisons#> the lower triangle: unadjusted p-values.#> # A tibble: 15 x 7#> pair r lower upper p lower.adj upper.adj#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 sleep_total-s~ 0.5922 4.000e-1 7.345e-1 4.981e- 7 0.3027 0.7817#> 2 sleep_total-s~ -0.3481 -6.214e-1 6.818e-4 5.090e- 2 -0.6789 0.1002#> 3 sleep_total-a~ -1 -1.000e+0 -1.000e+0 0. -1 -1#> 4 sleep_total-b~ -0.4293 -6.220e-1 -1.875e-1 9.621e- 4 -0.6858 -0.07796#> 5 sleep_total-b~ -0.3851 -5.547e-1 -1.847e-1 3.247e- 4 -0.6050 -0.1106#> 6 sleep_rem-sle~ -0.2066 -5.180e-1 1.531e-1 2.566e- 1 -0.5180 0.1531#> 7 sleep_rem-awa~ -0.5922 -7.345e-1 -4.000e-1 4.981e- 7 -0.7832 -0.2990#> 8 sleep_rem-bra~ -0.2636 -5.096e-1 2.217e-2 7.022e- 2 -0.5400 0.06404#> 9 sleep_rem-bod~ -0.3163 -5.262e-1 -7.004e-2 1.302e- 2 -0.5662 -0.01317#> 10 sleep_cycle-a~ 0.3481 -6.818e-4 6.214e-1 5.090e- 2 -0.1145 0.6867#> 11 sleep_cycle-b~ 0.7125 4.739e-1 8.536e-1 1.001e- 5 0.3239 0.8954#> 12 sleep_cycle-b~ 0.6545 3.962e-1 8.168e-1 4.834e- 5 0.2459 0.8656#> 13 awake-brainwt 0.4293 1.875e-1 6.220e-1 9.621e- 4 0.08322 0.6829#> 14 awake-bodywt 0.3851 1.847e-1 5.547e-1 3.247e- 4 0.1049 0.6087#> 15 brainwt-bodywt 0.8378 7.373e-1 9.020e-1 8.181e-16 0.6716 0.9238# getting the sample sizes for all pairsggstatsplot::ggcorrmat(data = ggplot2::msleep,cor.vars = sleep_total:bodywt,corr.method = "robust",output = "n" # note that n is different due to NAs)#> # A tibble: 6 x 7#> variable sleep_total sleep_rem sleep_cycle awake brainwt bodywt#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 sleep_total 83 61 32 83 56 83#> 2 sleep_rem 61 61 32 61 48 61#> 3 sleep_cycle 32 32 32 32 30 32#> 4 awake 83 61 32 83 56 83#> 5 brainwt 56 48 30 56 56 56#> 6 bodywt 83 61 32 83 56 83
Additionally, there is also a grouped_
variant of this function that
makes it easy to repeat the same operation across a single grouping
variable:
# for reproducibilityset.seed(123)# plot# let's use only 50% of the data to speed up the processggstatsplot::grouped_ggcorrmat(data = dplyr::sample_frac(ggstatsplot::movies_long, size = 0.5),cor.vars = length:votes,corr.method = "np",colors = c("#cbac43", "white", "#550000"),grouping.var = genre, # grouping variabletitle.prefix = "Movie genre",messages = FALSE,nrow = 2,ncol = 2)
For examples and more information, see the ggcorrmat
vignette:
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcorrmat.html
ggcoefstats
ggcoefstats
creates a lot with the regression coefficients’ point
estimates as dots with confidence interval whiskers.
# for reproducibilityset.seed(123)# plotggstatsplot::ggcoefstats(x = stats::lm(formula = mpg ~ am * cyl,data = mtcars))
The basic plot can be further modified to one’s liking with additional arguments (also, let’s use a robust linear model instead of a simple linear model now):
# for reproducibilityset.seed(123)# plotggstatsplot::ggcoefstats(x = MASS::rlm(formula = mpg ~ am * cyl,data = mtcars),point.color = "red",point.shape = 15,vline.color = "#CC79A7",vline.linetype = "dotdash",stats.label.size = 3.5,stats.label.color = c("#0072B2", "#D55E00", "darkgreen"),title = "Car performance predicted by transmission & cylinder count",subtitle = "Source: 1974 Motor Trend US magazine",ggtheme = hrbrthemes::theme_ipsum_ps(),ggstatsplot.layer = FALSE) +# further modification with the ggplot2 commands# note the order in which the labels are enteredggplot2::scale_y_discrete(labels = c("transmission", "cylinders", "interaction")) +ggplot2::labs(x = "regression coefficient",y = NULL)
Most of the regression models that are supported in the broom
and
broom.mixed
packages with tidy
and glance
methods are also
supported by ggcoefstats
. For example-
aareg
, anova
, aov
, aovlist
, Arima
, biglm
, brmsfit
,
btergm
, cch
, clm
, clmm
, confusionMatrix
, coxph
, ergm
,
felm
, fitdistr
, glmerMod
, glmmTMB
, gls
, gam
, Gam
,
gamlss
, garch
, glm
, glmmadmb
, glmmTMB
, glmrob
, gmm
,
ivreg
, lm
, lm.beta
, lmerMod
, lmodel2
, lmrob
, mcmc
,
MCMCglmm
, mediate
, mjoint
, mle2
, multinom
, nlmerMod
, nlrq
,
nls
, orcutt
, plm
, polr
, ridgelm
, rjags
, rlm
, rlmerMod
,
rq
, speedglm
, speedlm
, stanreg
, survreg
, svyglm
, svyolr
,
svyglm
, etc.
For an exhaustive list of all regression models supported by
ggcoefstats
and what to do in case the regression model you are
interested in is not supported, see the associated vignette-
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcoefstats.html
combine_plots
The full power of ggstatsplot
can be leveraged with a functional
programming package like purrr
that
replaces for
loops with code that is both more succinct and easier to
read and, therefore, purrr
should be preferrred 😻. (Another old school
option to do this effectively is using the plyr
package.)
In such cases, ggstatsplot
contains a helper function combine_plots
to combine multiple plots, which can be useful for combining a list of
plots produced with purrr
. This is a wrapper around
cowplot::plot_grid
and lets you combine multiple plots and add a
combination of title, caption, and annotation texts with suitable
defaults.
For examples (both with plyr
and purrr
), see the associated
vignette-
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/combine_plots.html
theme_ggstatsplot
All plots from ggstatsplot
have a default theme: theme_ggstatsplot
.
You can change this theme by using the argument ggtheme
for all
functions.
It is important to note that irrespective of which ggplot
theme you
choose, ggstatsplot
in the backdrop adds a new layer with its
idiosyncratic theme settings, chosen to make the graphs more readable or
aesthetically pleasing. Let’s see an example with gghistostats
and see
how a certain theme from hrbrthemes
package looks with and without the
ggstatsplot
layer.
# to use hrbrthemes themes, first make sure you have all the necessary fontslibrary(hrbrthemes)# extrafont::ttf_import()# extrafont::font_import()# try this yourselfggstatsplot::combine_plots(# with the ggstatsplot layerggstatsplot::gghistostats(data = iris,x = Sepal.Width,messages = FALSE,title = "Distribution of Sepal Width",test.value = 5,ggtheme = hrbrthemes::theme_ipsum(),ggstatsplot.layer = TRUE),# without the ggstatsplot layerggstatsplot::gghistostats(data = iris,x = Sepal.Width,messages = FALSE,title = "Distribution of Sepal Width",test.value = 5,ggtheme = hrbrthemes::theme_ipsum_ps(),ggstatsplot.layer = FALSE),nrow = 1,labels = c("(a)", "(b)"),title.text = "Behavior of ggstatsplot theme layer with chosen ggtheme")
For more on how to modify it, see the associated vignette- https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/theme_ggstatsplot.html
ggstatsplot
helpers to display text resultsSometimes you may not like the default plot produced by ggstatsplot
.
In such cases, you can use other custom plots (from ggplot2
or other
plotting packages) and still use ggstatsplot
(subtitle) helper
functions to display results from relevant statistical test. For
example, in the following chunk, we will use pirateplot from yarrr
package and use ggstatsplot
helper function to display the results.
# for reproducibilityset.seed(123)# loading the needed librarieslibrary(yarrr)library(ggstatsplot)# using `ggstatsplot` to prepare text with statistical resultsstats_results <-ggstatsplot::subtitle_anova_parametric(data = ChickWeight,x = Time,y = weight,messages = FALSE)# using `yarrr` to create plotyarrr::pirateplot(formula = weight ~ Time,data = ChickWeight,theme = 1,main = stats_results)
As the code stands right now, here is the code coverage for all primary functions involved: https://codecov.io/gh/IndrajeetPatil/ggstatsplot/tree/master/R
I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the Github issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull requests for contributions are encouraged.
Here are some simple ways in which you can contribute:
Read and correct any inconsistencies in the documentation
Raise issues about bugs or wanted features
Review code
Add new functionality (in the form of new plotting functions or helpers for preparing subtitles)
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
For details about the session information in which this README
file
was rendered, see-
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/session_info.html
NEW FEATURES
ggcoefstats
can support following new model objects: rjags
.VR_dilemma
dataset for toying around with within-subjects design.subtitle_t_onesample
supports both Cohen's d and Hedge's g as effect
sizes and also produces their confidence intervals. Additionally,
non-central variants of these effect sizes are also supported. Thus,
gghistostats
and its grouped_
variant gets two new arguments:
effsize.type
, effsize.noncentral
.ggpiestats
used to display odds ratio as effect size for paired designs
(McNemar test). But this was only working when the analysis was a 2 x 2
contingency table. It now instead displays Cohen's G as effect size, which
generalizes to any kind of design.MINOR CHANGES
outlier_df
to add a column specifying outlier status
of any given data point is now exported.ggstatsplot
previously relied on an internal function chisq_v_ci
to
compute confidence intervals for Cramer's V using bootstrapping but it was
pretty slow. It now instead relies on rcompanion
package to compute
confidence intervals for V. ggstatsplot
, therefore, gains a new
dependency.subtitle_mann_nonparametric
and subtitle_t_onesample
now computes effect
size r and its confidence intervals as $Z/\sqrt{N}$ (with the help of
rcompanion
package), instead of using Spearman correlation.BREAKING CHANGES
subtitle_t_onesample
no longer has data
as the optional argument. This
was done to be consistent with other subtitle helper functions.NEW FEATURES
ggbarstats
(and its grouped_
variant) introduced for making
bar charts (thanks to #78).ggcoefstats
also displays a caption with model summary when meta-analysis
is required.gghistostats
and its grouped_
variant has a new argument normal.curve
to superpose a normal distribution curve on top of the histogram (#138).ggcoefstats
can support following new regression model objects: brmsfit
,
gam
, Gam
, gamlss
, mcmc
, mjoint
, stanreg
.gg
/ggplot
class to
ggplot
class objects.effsize
to compute Cohen's d and Hedge's g,
ggstatsplot
now relies on a new (#159) internal function
effect_t_parametric
to compute them. This removes effsize
from
dependencies.ggbarstats
and
ggpiestats
gain results.subtitle
which can be set to FALSE
if
statistical analysis is not required, in which case subtitle
argument can
be used to provide alternative subtitle.MAJOR CHANGES
ggbetweenstats
now defaults to using noncentral-t distribution for
computing Cohen's d and Hedge's g. To get variants with central-t
distribution, use effsize.noncentral = FALSE
.MINOR CHANGES
grouped_
functions had argument title.prefix
that defaulted to
"Group"
. It now instead defaults to NULL
, in which case the prefix will
variable name for grouping.var
argument.subtitle_template
function can now
work with parameter = NULL
.ggbetweenstats
, details contained in the subtitle for non-parametric
test are modified. It now uses Spearman's rho-based effect size estimates.
This removes coin
from dependencies.ggbetweenstats
and its grouped_
variant gain a new argument
axes.range.restrict
(which defaults to FALSE
). This restricts y
-axes
limits to minimum and maximum of y
variable. This is what these functions
were doing by default in the past versions, which created issues for
additional ggplot components using the ggplot.component
argument.prior.width
with
r_{Cauchy}
.ggcoefstats
passes dots (...
) to augment
method from broom
.BUG FIXES
bf_extractor
no longer provides option to extract
information about posterior distribution because these details were
incorrect. The posterior = TRUE
details were not used anywhere in the
package so nothing about the results changes.ggcorrmat
didn't output pair names when output == "ci"
was used. This is
fixed.NEW FEATURES
ggcoefstats
gains meta.analytic.effect
that can be used to carry out
meta-analysis on regression estimates. This especially useful when a
dataframe with regression estimates and standard error is available from
prior analyses. The subtitle
is prepared with the new function
subtitle_meta_ggcoefstats
which is also exported.ggbetweenstats
, ggscatterstats
, gghistostats
, and ggdotplotstats
(and their grouped_
variants) all gain a new ggplot.component
argument.
This argument will primarily be helpful to change the individual plots in a
grouped_
plot.ggcoefstats
can support following new regression model objects: polr
,
survreg
, cch
, Arima
, biglm
, glmmTMB
, coxph
, ridgelm
, aareg
,
plm
, nlrq
, ivreg
, ergm
, btergm
, garch
, gmm
, lmodel2
,
svyolr
, confusionMatrix
, multinom
, nlmerMod
, svyglm
, MCMCglmm
,
lm.beta
, speedlm
, fitdistr
, mle2
, orcutt
, glmmadmb
.BUG FIXES
ggcoefstats
didn't work when statistic
argument was set to NULL
. This
was not expected behavior. This has been fixed. Now, if statistic
is not
specified, only the dot-and-whiskers will be shown without any labels.subtitle_t_parametric
was producing incorrect sample size information when
paired = TRUE
and the data contained NA
s. This has been fixed.MAJOR CHANGES
ggscatterstats
and its grouped_
variant accept both character and bare
exressions as input to arguments label.var
and labe.expression
(#110).ggscatterstats
, by default, showed jittered data points (because it relied
on position_jitter
defaults). This could be visually inaccurate and,
therefore, ggscatterstats
now displays points without any jitter. The user
can introduce jitter if they wish to using point.width.jitter
and
point.height.jitter
arguments. For similar reasons, for ggbetweenstats
and its grouped_
variant, point.jitter.height
default has been changed
from 0.1
to 0
(no vertical jitter, i.e.).MINOR CHANGES
stats::kruskal.test
. As a result, PMCMRplus
removed from dependencies.ggcoefstats
gains a caption
argument. If caption.summary
is set to
TRUE
, the specified caption will be added on top of the caption.summary
.BUG FIXES
ggcoefstats
was showing wrong confidence intervals for merMod
class
objects due to a bug in the broom.mixed
package
(https://github.com/bbolker/broom.mixed/issues/30#issuecomment-428385005).
This was fixed in broom.mixed
and so ggcoefstats
should no longer have
any issues.specify_decimal_p
has been modified because it produced incorrect results
when k < 3
and p.value = TRUE
(e.g., 0.002
was printed as < 0.001
).ggpiestats
produced incorrect results if some levels of the factor had
been filtered out prior to using this function. It now drops unused levels
and produces correct results.gghistostats
wasn't filtering out NA
s properly. This has been fixed.MAJOR CHANGES
ggdotplotstats
for creating a dot plot/chart for labelled
numeric data.conf.level
argument to control confidence level
for effect size measures.k
argument for all functions has been
changed from 3
to 2
.ggbetweenstats
subtitles have been renamed to
remove _ggbetween_
from their names as this was becoming confusing for the
user. Some of these functions work both with the between- and
within-subjects designs, so having _ggbetween_
in their names made users
suspect if they could use these functions for within-subjects designs.ggstatsplot
now depends on R 3.5.0
. This is because some of its
dependencies require 3.5.0 to work (e.g., broom.mixed
).theme_
functions are now exported (theme_pie()
, theme_corrmat()
).ggbetweenstats
now supports multiple pairwise comparison tests
(parametric, nonparametric, and robust variants). It gains a new dependency
ggsignif
.ggbetweenstats
now supports eta-squared and omega-squared effect sizes for
anova models. This function gains a new argument partial
.groupedstats
package to
avoid repeating the same code in two packages: specify_decimal_p
,
signif_column
, lm_effsize_ci
, and set_cwd
. Therefore, groupedstats
is now added as a dependency.gghistostats
can now show both counts and proportions information on the
same plot when bar.measure
argument is set to "mix"
.ggcoefstats
works with tidy dataframes.untable
has been deprecated in light of
tidyr::uncount
, which does exactly what untable
was doing. The author
wasn't aware of this function when untable
was written.CRAN
to reduce the size of the
package. They are now available on the package website:
https://indrajeetpatil.github.io/ggstatsplot/articles/.subtitle_t_robust
function can now handle dependent samples and
gains paired
argument.ggstatsplot
: %>%
,
%<>%
, %$%
.MINOR CHANGES
ggscatterstats
, ggpiestats
, and their grouped_
variant support bayes
factor tests and gain new arguments relevant to this test.ggbetweenstats
supports bayes factor tests for anova designs.ggpiestats
(and its grouped_
version) gain slice.label
argument that
decides what information needs to be displayed as a label on the slices of
the pie chart: "percentage"
(which has been the default thus far),
"counts"
, or "both"
.ggcorrmat
can work with cor.vars = NULL
. In such case, all numeric
variables from the provided dataframe will be used for computing the
correlation matrix.stable
to maturing
.palette_message()
).ggscatterstats
and
gghistostats
with the argument results.subtitle
), so ggbetweenstats
also gains two new arguments to do this: results.subtitle
and subtitle
.iris_long
.MAJOR CHANGES
subtitle
, the current
default for ggstatsplot
).ggcorrmat
gains p.adjust.method
argument which allows p-values for
correlations to be corrected for multiple comparisons.ggscatterstats
gains label.var
and label.expression
arguments to
attach labels to points.gghistostats
now defaults to not showing (redundant) color gradient
(fill.gradient = FALSE
) and shows both "count"
and "proportion"
data.
It also gains a new argument bar.fill
that can be used to fill bars with a
uniform color.ggbetweenstats
, ggcoefstats
, ggcorrmat
, ggscatterstats
, and
ggpiestats
now support all palettes contained in the paletteer
package.
This helps avoid situations where people had large number of groups (> 12)
and there were not enough colors in any of the RColorBrewer
palettes.ggbetweenstats
gains bf.message
argument to display bayes factors in
favor of the null (currently works only for parametric t-test).gghistostats
function no longer has line.labeller.y
argument; this
position is automatically determined now.BREAKING CHANGES
legend.title.margin
function has been deprecated since ggplot2 3.0.0
has improved on the margin issues from previous versions. All functions that
wrapped around this function now lose the relevant arguments
(legend.title.margin
, t.margin
, b.margin
).ggstatsplot.theme
has been changed to ggstatsplot.layer
for
ggcorrmat
function to be consistent across functions.conf.level
and conf.type
arguments for ggbetweenstats
have been deprecated. No other function in the package allowed changing
confidence interval or their type for effect size estimation. These
arguments were relevant only for robust
tests anyway.ggocorrmat
argument type
has been changed to matrix.type
because for
all other functions type
argument specifies the type of the test, while
for this function it specified the display of the visualization matrix.
This will make the syntax more consistent across functions.ggscatterstats
gains new arguments to specify aesthetics for geom point
(point.color
, point.size
, point.alpha
). To be consistent with this
naming schema, the width.jitter
and height.jitter
arguments have been
renamed to point.width.jitter
and point.height.jitter
, resp.MINOR CHANGES
gghistostats
: To be compatible with JASP
, natural logarithm of Bayes
Factors is displayed, and not base 10 logarithm.ggscatterstats
gains method
and formula
arguments to modify smoothing
functions.ggcorrmat
can now show robust
correlation coefficients in the matrix
plot.gghistostats
, binwidth
value, if not specified, is computed with
(max-min)/sqrt(n)
. This is basically to get rid of the warnings ggplot2
produces. Thanks to Chuck Powell's PR (#43).ggcoefstats
gains a new argument partial
and can display eta-squared and
omega-squared effect sizes for anovas, in addition to the prior partial
variants of these effect sizes.ggpiestats
gains perc.k
argument to show desired number of decimal
places in percentage labels.BUG FIXES
grouped_ggpiestats
wasn't working when only main
variable was provided
with counts
data. Fixed that.MAJOR CHANGES
theme_mprl
is now called theme_ggstatsplot
.
The theme_mprl
function will still be around and will not be deprecated,
so feel free to use either or both of them since they are identical.ggcoefstats
no longer has arguments effects
and ran_params
because
only fixed effects are shown for mixed-effects models.ggpiestats
can now handle within-subjects designs (McNemar test results
will be displayed).BUG FIXES
ggbetweenstats
was producing wrong axes labels when sample.size.label
was set to TRUE
and user had reordered factor levels before using this
function. The new version fixes this.ggcoefstats
wasn't producing partial omega-squared for aovlist
objects.
Fixed that with new version of sjstats
.MINOR CHANGES
gghistostats
has a new argument to remove color fill gradient.ggbetweenstats
takes new argument mean.ci
to show confidence intervals
for the mean values.lmer
models, p-values are now computed using sjstats::p_value
. This
removes lmerTest
package from dependencies.sjstats
no longer suggests apaTables
package to compute confidence
intervals for partial eta- and omega-squared. Therefore, apaTables
and
MBESS
are removed from dependencies.ggscatterstats
supports densigram
with the development version of
ggExtra
. It additionally gains few extra arguments to change aesthetics of
marginals (alpha, size, etc.).MAJOR CHANGES
ggcoefstats
for displaying model coefficients.ggtheme
argument that can be used to change the
default theme, which has now been changed from theme_grey()
to
theme_bw()
.MASS::rlm
, but percentage bend
correlation, as implemented in WRS2::pbcor
. This was done to be consistent
across different functions. ggcorrmat
also uses percentage bend
correlation as the robust correlation measure. This also means that
ggstatsplot
no longer imports MASS
and sfsmisc
.data
argument is no longer NULL
for all functions, except
gghistostats
. In other words, the user must provide a dataframe from
which variables or formulas should be selected.12
to 11
.MINOR CHANGES
nortest
from
imports.ggpiestats
can now handle dataframes withggbetweenstats
and ggpiestats
now display sample sizes for each level of
the groping factor by default. This behavior can be turned off by setting
sample.size.label
to FALSE
.Titanic_full
, movies_wide
, movies_long
.boot::boot
. Therefore,
the package no longer imports DescTools
.legend.title.margin
arguments for gghistostats
and ggcorrmat
now
default to FALSE
, since ggplot2 3.0.0
has better legend title margins.ggpiestats
now sorts the summary dataframes not by percentages but by the
levels of main
variable. This was done to have the same legends across
different levels of a grouping variable in grouped_ggpiestats
.ggpiestats
no
longer shows titles for the tests run (these were "Proportion test" and
"Chi-Square test"). From the pie charts, it should be obvious to the user or
reader what test was run.gghistostats
also allows running robust version of one-sample test now
(One-sample percentile bootstrap).NEW FEATURES
ggbetweenstats
function can now show notched box plots. Two new
arguments notch
and notchwidth
control its behavior. The defaults are
still standard box plots.outlier.label
argument was of
character
type.gghistostats
supports proportion
and density
as a value measure for
bar heights to show proportions and density. New argument bar.measure
controls this behavior.grouped_
variants of functions ggcorrmat
, ggscatterstats
,
ggbetweenstats
, and ggpiestats
introduced to create multiple plots for
different levels of a grouping variable.MAJOR CHANGES
ggstatsplot
use the spelling
color
, rather than colour
in some functions, while color
in others.binwidth.adjust
from gghistostats
function. This argument was relevant for the first avatar of this function,
but is no longer playing any role.lab_col
and lab_size
in
ggcorrmat
have been changed to lab.col
and lab.size
, respectively.MINOR CHANGES
ggstatsplot.theme
function to control if
ggstatsplot::theme_mprl
is to be overlaid on top of the selected ggtheme
(ggplot2 theme, i.e.).gghistostats
to allow user to change colorbar
gradient. Defaults are colorblind friendly.gghistostats
and ggcorrmat
have a new argument
legend.title.margin
to control margin adjustment between the title and the
colorbar.line.labeller
in
gghistostats
function.BUG FIXES
centrality.para
argument for ggscatterstats
was not working
properly. Choosing "median"
didn't show median, but the mean. This is
fixed now.NEW FEATURES
gghistostats
and two new arguments to also display
a vertical line for test.value
argument.gghistostats
.grouped_gghistostats
to facilitate applying
gghistostats
for multiple levels of a grouping factor.ggbetweenstats
has a new argument outlier.coef
to adjust threshold used
to detect outliers. Removed bug from the same function when outlier.label
argument is of factor/character type.MAJOR CHANGES
signif_column
and grouped_proptest
are now deprecated. They
were exported in the first release by mistake.gghistostats
no longer displays both density and count since the
density information was redundant. The density.plot
argument has also been
deprecated.ggscatterstats
argument intercept
has now been changed to
centrality.para
. This was due to possible confusion about interpretation of
these lines; they show central tendency measures and not intercept for the
linear model. Thus the change.effsize.type = "biased"
effect size for ggbetweenstats
in case of ANOVA is partial omega-squared, and not omega-squared.
Additionally, both partial eta- and omega-squared are not computed using
bootstrapping with (default) 100 bootstrap samples.MINOR CHANGES
README
document.broom
package. RVAideMemoire
package is thus removed from dependencies.ggbetweenstats
function are now computed using sjstats
package, which
allows bootstrapping. apaTables
and userfriendlyscience
packages are
thus removed from dependencies.