Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages. This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like Cramer's V, Phi, or effect size statistics like Eta or Omega squared), or for which currently no functions available. Second, another focus lies on weighted variants of common statistical measures and tests like weighted standard error, mean, t-test, correlation, and more.

Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages.

This package aims at providing, **first**, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors, Cronbach's Alpha or root mean squared errors), or for which currently no functions available.

**Second**, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the `r2()`

-function returns the r-squared value for *lm*, *glm*, *merMod*, *glmmTMB*, or *lme* and other objects).

Most functions of this package are designed as *summary functions*, i.e. they do not transform the input vector; rather, they return a summary, which is sometimes a vector and sometimes a tidy data frame (where column names follow a common convention). The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models, mixed effects models or Bayesian models. However, some of the functions deal with other statistical measures, like Cronbach's Alpha, Cramer's V, Phi etc.

The comprised tools include:

- For regression and mixed models: Coefficient of Variation, Root Mean Squared Error, Residual Standard Error, Coefficient of Discrimination, R-squared and pseudo-R-squared values, standardized beta values, p-values
- Especially for mixed models: Design effect, ICC, sample size calculation and convergence tests
- Especially for Bayesian models: Highest Density Interval, region of practical equivalence (rope), Monte Carlo Standard Errors, ratio of number of effective samples, mediation analysis, Test for Practical Equivalence
- Fit and accuracy measures for regression models: Overdispersion tests, accuracy of predictions, test/training-error comparisons, error rate and binned residual plots for logistic regression models
- For anova-tables: Eta-squared, Partial Eta-squared, Omega-squared and Partial Omega-squared statistics

Furthermore, *sjstats* has functions to access information from model objects, which either support more model objects than their *stats* counterparts, or provide easy access to model attributes, like:

`model_frame()`

to get the model frame,`model_family()`

to get information about the model family, link functions etc.,`link_inverse()`

to get the link-inverse function,`pred_vars()`

and`resp_var()`

to get the names of either the dependent or independent variables, or`var_names()`

to get the "cleaned" variables names from a model object (cleaned means, things like`s()`

or`log()`

are removed from the returned character vector with variable names.)

Other statistics:

- Cramer's V, Cronbach's Alpha, Mean Inter-Item-Correlation, Mann-Whitney-U-Test, Item-scale reliability tests

Please visit https://strengejacke.github.io/sjstats/ for documentation and vignettes.

To install the latest development snapshot (see latest changes below), type following commands into the R console:

`library(devtools)devtools::install_github("strengejacke/sjstats")`

Please note the package dependencies when installing from GitHub. The GitHub version of this package may depend on latest GitHub versions of my other packages, so you may need to install those first, if you encounter any problems. Here's the order for installing packages from GitHub:

sjlabelled → sjmisc → sjstats → ggeffects → sjPlot

To install the latest stable release from CRAN, type following command into the R console:

`install.packages("sjstats")`

In case you want / have to cite my package, please use `citation('sjstats')`

for citation information.

- Following models/objects are now supported by model-information functions like
`model_family()`

,`link_inverse()`

or`model_frame()`

:`MixMod`

(package**GLMMadaptive**),**MCMCglmm**,`mlogit`

and`gmnl`

. - Reduce package dependencies.

`cred_int()`

, to compute uncertainty intervals of Bayesian models. Mimics the behaviour and style of`hdi()`

and is thus a convenient complement to functions like`posterior_interval()`

.

`equi_test()`

now finds better defaults for models with binomial outcome (like logistic regression models).`r2()`

for mixed models now also should work properly for mixed models fitted with**rstanarm**.`anova_stats()`

and alike (e.g.`eta_sq()`

) now all preserve original term names.`model_family()`

now returns`$is_count = TRUE`

, when model is a count-model, and`$is_beta = TRUE`

for models with beta-family.`pred_vars()`

checks that return value has only unique values.`pred_vars()`

gets a`zi`

-argument to return the variables from a model's zero-inflation-formula.

- Fix minor issues in
`wtd_sd()`

and`wtd_mean()`

when weight was`NULL`

(which usually shoudln't be the case anyway). - Fix potential issue with
`deparse()`

, cutting off very long formulas in various functions. - Fix encoding issues in help-files.

- Export
`dplyr::n()`

, to meet forthcoming changes in dplyr 0.8.0.

`boot_ci()`

gets a`ci.lvl`

-argument.- The
`rotation`

-argument in`pca_rotate()`

now supports all rotations from`psych::principal()`

. `pred_vars()`

gets a`fe.only`

-argument to return only fixed effects terms from mixed models, and a`disp`

-argument to return the variables from a model's dispersion-formula.`icc()`

for Bayesian models gets a`adjusted`

-argument, to calculate adjusted and conditional ICC (however, only for Gaussian models).- For
`icc()`

for non-Gaussian Bayes-models, a message is printed that recommends setting argument`ppd`

to`TRUE`

. `resp_val()`

and`resp_var()`

now also work for**brms**-models with additional response information (like`trial()`

in formula).`resp_var()`

gets a`combine`

-argument, to return either the name of the matrix-column or the original variable names for matrix-columns.`model_frame()`

now also returns the original variables for matrix-column-variables.`model_frame()`

now also returns the variable from the dispersion-formula of**glmmTMB**-models.`model_family()`

and`link_inverse()`

now supports**glmmPQL**,**felm**and**lm_robust**-models.`anova_stats()`

and alike (`omeqa_sq()`

etc.) now support gam-models from package**gam**.`p_value()`

now supports objects of class`svyolr`

.

- Fix issue with
`se()`

and`get_re_var()`

for objects returned by`icc()`

. - Fix issue with
`icc()`

for Stan-models. `var_names()`

did not clear terms with log-log transformation, e.g.`log(log(y))`

.- Fix issue in
`model_frame()`

for models with splines with only one column.

- Revised help-files for
`r2()`

and`icc()`

, also by adding more references.

`re_grp_var()`

to find group factors of random effects in mixed models.

`omega_sq()`

and`eta_sq()`

give more informative messages when using non-supported objects.`r2()`

and`icc()`

give more informative warnings and messages.`tidy_stan()`

supports printing simplex parameters of monotonic effects of**brms**models.`grpmean()`

and`mwu()`

get a`file`

and`encoding`

argument, to save the HTML output as file.

`model_frame()`

now correctly names the offset-columns for terms provided as`offset`

-argument (i.e. for models where the offset was not specified inside the formula).- Fixed issue with
`weights`

-argument in`grpmean()`

when variable name was passed as character vector. - Fixed issue with
`r2()`

for**glmmTMB**models with`ar1`

random effects structure.

`wtd_chisqtest()`

to compute a weighted Chi-squared test.`wtd_median()`

to compute the weighted median of variables.`wtd_cor()`

to compute weighted correlation coefficients of variables.

`mediation()`

can now cope with models from different families, e.g. if the moderator or outcome is binary, while the treatment-effect is continuous.`model_frame()`

,`link_inverse()`

,`pred_vars()`

,`resp_var()`

,`resp_val()`

,`r2()`

and`model_family()`

now support`clm2`

-objects from package**ordinal**.`anova_stats()`

gives a more informative message for non-supported models or ANOVA-options.

- Fixed issue with
`model_family()`

and`link_inverse()`

for models fitted with`pscl::hurdle()`

or`pscl::zeroinfl()`

. - Fixed issue with wrong title in
`grpmean()`

for grouped data frames, when grouping variable was an unlabelled factor. - Fix issue with
`model_frame()`

for**coxph**-models with polynomial or spline-terms. - Fix issue with
`mediation()`

for logical variables.

- Reduce package dependencies.

`wtd_ttest()`

to compute a weighted t-test.`wtd_mwu()`

to compute a weighted Mann-Whitney-U or Kruskal-Wallis test.

`robust()`

was revised, getting more arguments to specify different types of covariance-matrix estimation, and handling these more flexible.- Improved
`print()`

-method for`tidy_stan()`

for*brmsfit*-objects with categorical-families. `se()`

now also computes standard errors for relative frequencies (proportions) of a vector.`r2()`

now also computes r-squared values for*glmmTMB*-models from`genpois`

-families.`r2()`

gives more precise warnings for non-supported model-families.`xtab_statistics()`

gets a`weights`

-argument, to compute measures of association for contingency tables for weighted data.- The
`statistics`

-argument in`xtab_statistics()`

gets a`"fisher"`

-option, to force Fisher's Exact Test to be used. - Improved variance calculation in
`icc()`

for generalized linear mixed models with Poisson or negative binomial families. `icc()`

gets an`adjusted`

-argument, to calculate the adjusted and conditional ICC for mixed models.- To get consistent argument names accross functions, argument
`weight.by`

is now deprecated and renamed into`weights`

.

- Fix issues with effect size computation for repeated-measure Anova when using bootstrapping to compute confidence intervals.
`grpmean()`

now also adjusts the`n`

-columm for weighted data.`icc()`

,`re_var()`

and`get_re_var()`

now correctly compute the random-effect-variances for models with multiple random slopes per random effect term (e.g.,`(1 + rs1 + rs2 | grp)`

).- Fix issues in
`tidy_stan()`

,`mcse()`

,`hdi()`

and`n_eff()`

for`stan_polr()`

-models. - Plotting
`equi_test()`

did not work for intercept-only models.

- The S3-generics for functions like
`hdi()`

,`rope()`

,`equi_test()`

etc. are now more generic, and function usage for each supported object is now included in the documentation. - Following functions are now S3-generic:
`icc()`

,`r2()`

,`p_value()`

,`se()`

, and`std_beta()`

. - Added
`print()`

-methods for some more functions, for a clearer output. - Revised
`r2()`

for mixed models (packages**lme4**,**glmmTMB**). The r-squared value should be much more precise now, and reports the marginal and conditional r-squared values. - Reduced package dependencies and removed
*apaTables*and*MBESS*from suggested packages `stanmvreg`

-models are now supported by many functions.

`binned_resid()`

to plot binned residuals for logistic regression models.`error_rate()`

to compute model quality for logistic regression models.`auto_prior()`

to quickly create automatically adjusted priors for brms-models.`difficulty()`

to compute the item difficulty.

`icc()`

gets a`ppd`

-argument for Stan-models (*brmsfit*and*stanreg*), which performs a variance decomposition based on the posterior predictive distribution. This is the recommended way for non-Gaussian models.- For Stan-models (
*brmsfit*and*stanreg*),`icc()`

now also computes the HDI for the ICC and random-effect variances. Use the`prob`

-argument to specify the limits of this interval. `link_inverse()`

and`model_family()`

now support*clmm*-models (package*ordinal*) and*glmRob*and*lmRob*-models (package*robust*).`model_family()`

gets a`multi.resp`

-argument, to return a list of family-informations for multivariate-response models (of class`brmsfit`

or`stanmvreg`

).`link_inverse()`

gets a`multi.resp`

-argument, to return a list of link-inverse-functions for multivariate-response models (of class`brmsfit`

or`stanmvreg`

).`p_value()`

now supports*rlm*-models (package*MASS*).`check_assumptions()`

for single models with`as.logical = FALSE`

now has a nice print-method.`eta_sq()`

and`omega_sq()`

now also work for repeated-measure Anovas, i.e. Anova with error term (requires broom > 0.4.5).

`model_frame()`

and`var_names()`

now correctly cleans nested patterns like`offset(log(x + 10))`

from column names.`model_frame()`

now returns proper column names from*gamm4*models.`model_frame()`

did not work when the model frame had spline-terms and weights.- Fix issue in
`robust()`

when`exponentiate = TRUE`

and`conf.int = FALSE`

. `reliab_test()`

returned an error when the provided data frame has less than three columns, instead of returning`NULL`

.

- Added new Vignette
*Statistics for Bayesian Models*.

`equi_test()`

to test if parameter values in Bayesian estimation should be accepted or rejected.`mediation()`

to print a summary of a mediation analysis from multivariate response models fitted with*brms*.

`link_inverse()`

now also returns the link-inverse function for cumulative-family*brms*-models.`model_family()`

now also returns an`is_ordinal`

-element with information if the model is ordinal resp. a cumulative link model.- Functions that access model information (like
`model_family()`

) now better support`vglm`

-models (package*VGAM*). `r2()`

now also calculates the standard error for*brms*or*stanreg*models.`r2()`

gets a`loo`

-argument to calculate LOO-adjusted rsquared values for*brms*or*stanreg*models. This measure comes conceptionally closer to an adjusted r-squared measure.- Effect sizes (
`anova_stats()`

,`eta_sq()`

etc.) are now also computed for mixed models. - To avoid confusion,
`n_eff()`

now computes the number of effective samples, and no longer its ratio in relation to the total number of samples. - The column name for the ratio of the number of effective samples in
`tidy_stan()`

is now named*neff_ratio*, to avoid confusion.

- Fixed issue in
`se()`

for`icc()`

-objects, where random effect term could not be found. - Fixed issue in
`se()`

for`merMod`

-objects. - Fixed issue in
`p_value()`

for mixed models with KR-approximation, which is now more accurate.

- Remove
*tidyverse*from suggested packages, as requested by maintainers.

`mwu()`

now requires a data frame as first argument, followed by the names of the two variables to perform the Mann-Whitney-U-Test on.

`tidy_stan()`

was improved especially for more complex multilevel models.- Make
`tidy_stan()`

for large`brmsfit`

-objects (esp. with random effects) more efficient. - Better
`print()`

-method for`tidy_stan()`

,`hdi()`

,`rope()`

,`icc()`

and some other functions. `link_inverse()`

now also should return the link-inverse function for most (or some or all?) custom families of*brms*-models.- The
`weight.by`

-arguments in`grpmean()`

and`mwu()`

now should be a variable name from a variable in`x`

, and no longer a separate vector.

`model_family()`

to get model-information about family and link-functions. This function is intended to be "generic" and work with many different model objects, because not all packages provide a`family()`

function.

- Fix issue with
`omega_sq()`

,`eta_sq()`

etc. when confidence intervals were computed with bootstrapping and the model-formula contained function calls like`scale()`

or`as.factor()`

. - Fix issue with
`p_value()`

for unconditional mixed models. - Fix typo in
`xtab_statistics()`

. - Fix issue with wrong calculation of Nagelkerke's r-squared value in
`r2()`

. - Fix issue for factors with character leves in
`typical_value()`

, when argument`fun`

for factors was set to`mode`

. - Don't show prior-samples in
`hdi()`

,`tidy_stan()`

etc. for*brmsfit*-objects. - Fixed issues in
`model_frame()`

with spline-terms when missing values were removed due to casewise deletion.

- Revise examples, vignettes and package description to make sure all used packages are available for CRAN checks on operating systems.

`residuals.svyglm.nb()`

as S3-generic`residuals()`

method for objects fitted with`svyglm.nb()`

.

`icc()`

gets a`posterior`

-argument, to compute ICC-values from`brmsfit`

-objects, for the whole posterior distribution.`icc()`

now gives a warning when computed for random-slope-intercept models, to warn user about probably inappropriate inference.`r2()`

now computes Bayesian version of R-squared for`stanreg`

and`brmsfit`

objects.- Argument
`prob`

in`hdi()`

now accepts a vector of scalars to compute HDIs for multiple probability tresholds at once. - Argument
`probs`

in`tidy_stan()`

was renamed into`prob`

, to be consistent with`hdi()`

. `mwu()`

gets an`out`

-argument, to print output to console, or as HTML table in the viewer or web browser.`scale_weights()`

now also works if weights have missing values.`hdi()`

and`rope()`

get`data.frame`

-methods.`omega_sq()`

and`eta_sq()`

get a`ci.lvl`

-argument to compute confidence intervals for the effect size statistics.`omega_sq()`

,`eta_sq()`

and`cohens_f()`

now always return a data frame with at least two columns: term name and effect size. Confidence intervals are added as additional columns, if the`ci.lvl`

-argument is`TRUE`

.`omega_sq()`

gets a`partial`

-argument to compute partial omega-squared.`omega_sq()`

,`eta_sq()`

,`cohens_f()`

and`anova_stats()`

now support`anova.rms`

-objects from the*rms*-package.

- Fix unnecessary warning for tibbles in
`mic()`

. - Make sure that
`model_frame()`

does not return duplicated column names. - Fix issue in
`tidy_stan()`

with incorrect*n_eff*statistics for*sigma*parameter in mixed models. - Fix issue in
`tidy_stan()`

, which did not work when`probs`

was of length greater than 2. - Fix issue in
`icc()`

with*brmsfit*-models, which was broken probably due to internal changes in*brms*.

- Remove unused imports.
- Cross refences from
`dplyr::select_helpers`

were updated to`tidyselect::select_helpers`

.

`var_names()`

now also cleans variable names from variables modeled with the`mi()`

function (multiple imputation on the fly in*brms*).`reliab_test()`

gets an`out`

-argument, to print output to console, or as HTML table in the viewer or web browser.

- Fix issues with
`mcse()`

,`n_eff()`

and`tidy_stan()`

with more complex*brmsfit*-models. - Fix issue in
`typical_value()`

to prevent error for R-oldrel-Windows. `model_frame()`

now returns response values from models, which are in matrix form (bound with`cbind()`

), as is.- Fixed issues in
`grpmean()`

, where values instead of value labels were printed if some categories were not present in the data.

- Beautiful colored output for
`grpmean()`

and`mwu()`

.

`mcse()`

to compute the Monte Carlo standard error for`stanreg`

- and`brmsfit`

-models.`n_eff()`

to compute the effective sample size for`stanreg`

- and`brmsfit`

-models.

`grpmean()`

now uses`contrasts()`

from package*emmeans*to compute p-values, which correclty indicate whether the sub-group mean is significantly different from the total mean.`grpmean()`

gets an`out`

-argument, to print output to console, or as HTML table in the viewer or web browser.`tidy_stan()`

now includes information on the Monte Carlo standard error.`model_frame()`

,`p_value()`

and`link_inverse()`

now support Zelig-relogit-models.`typical_value()`

gets an explicit`weight.by`

-argument.

`model_frame()`

did not work properly for variables that were standardized with`scale()`

.- In certain cases,
`weight.by`

-argument did not work in`grpmean()`

.

- Remove deprecated
`get_model_pval()`

. - Revised documentation for
`overdisp()`

.

`scale_weights()`

to rescale design weights for multilevel models.`pca()`

and`pca_rotate()`

to create tidy summaries of principal component analyses or rotated loadings matrices from PCA.`gmd()`

to compute Gini's mean difference.`is_prime()`

to check whether a number is a prime number or not.

`link_inverse()`

now supports`brmsfit`

,`multinom`

and`clm`

-models.`p_value()`

now supports`polr`

and`multinom`

-models.`zero_count()`

gets a`tolerance`

-argument, to accept models with a ratio within a certain range of 1.`var_names()`

now also cleans variable names from variables modelled with the`offset()`

,`lag()`

or`diff()`

function.`icc()`

,`re_var()`

and`get_re_var()`

now support`brmsfit`

-objects (models fitted with the*brms*-package).- For
`fun = "weighted.mean"`

,`typical_value()`

now checks if vector of weights is of same length as`x`

. - The print-method for
`grpmean()`

now also prints the overall p-value from the model.

`resp_val()`

,`cv_error()`

and`pred_accuracy()`

did not work for formulas with transforming function for response terms, e.g.`log(response)`

.

- Fixed examples, to resolve issues with CRAN package checks.
- More model objects supported in
`p_value()`

.

`model_frame()`

to get the model frame from model objects, also of those models that don't have a S3-generic model.frame-function.`var_names()`

to get cleaned variable names from model objects.`link_inverse()`

to get the inverse link function from model objects.

- The
`fun`

-argument in`typical_value()`

can now also be a named vector, to apply different functions for numeric and categorical variables.

- Fixed issue with specific model formulas in
`pred_vars()`

. - Fixed issue with specific model objects in
`resp_val()`

. - Fixed issue with nested models in
`re_var()`

.

`tidy_stan()`

to return a tidy summary of Stan-models.

`hdi()`

and`rope()`

now also work for`brmsfit`

-models, from package*brms*.`hdi()`

and`rope()`

now have a`type`

-argument, to return fixed, random or all effects for mixed effects models.

`typical_value()`

gets a "zero"-option for the`fun`

-argument.- Changes to
`icc()`

, which used`stats::sigma()`

and thus required R-version 3.3 or higher. Now should depend on R 3.2 again. `se()`

now also supports`stanreg`

and`stanfit`

objects.`hdi()`

now also supports`stanfit`

-objects.`std_beta()`

gets a`ci.lvl`

-argument, to specify the level of the calculated confidence interval for standardized coefficients.`get_model_pval()`

is now deprecated. Please use`p_value()`

instead.

`rope()`

to calculate the region of practical equivalence for MCMC samples.

- Added vignettes for various functions.
- Fixed issue with latest tidyr-update on CRAN.

`grpmean()`

to compute mean values by groups (One-way Anova).`hdi()`

to compute high density intervals (HDI) for MCMC samples.`find_beta()`

and`find_beta2()`

to find the shape parameters of a Beta distribution.`find_normal()`

and`find_cauchy()`

to find the parameters of a normal or cauchy distribution.