This is a collection of tools that the author (Jacob) has written for the purpose of more efficiently understanding and sharing the results of (primarily) regression analyses. There are also a number of miscellaneous functions for statistical and programming purposes. Just about everything supports models from the survey package.
This package consists of a series of functions created by the author
(Jacob) to automate otherwise tedious research tasks. At this juncture,
the unifying theme is the more efficient presentation of regression
analyses. There are a number of functions for other programming and
statistical purposes as well. Support for the
svyglm objects as well as weighted regressions is a common theme
Notice: As of
jtools version 2.0.0, all functions dealing with
johnson_neyman()) have been moved to a new package, aptly named
For the most stable version, simply install from CRAN.
If you want the latest features and bug fixes then you can download from
Github. To do that you will need to have
devtools installed if you
Then install the package from Github.
You should also check out the
dev branch of this
repository for the latest and greatest changes, but also the latest and
greatest bugs. To see what features are on the roadmap, check the issues
section of the repository, especially the “enhancement” tag.
Here’s a synopsis of the current functions in the package:
summ() is a replacement for
summary() that provides the user several
options for formatting regression summaries. It supports
merMod objects as input as well. It supports calculation
and reporting of robust standard errors via the
fit <- lm(mpg ~ hp + wt, data = mtcars)summ(fit)
#> MODEL INFO: #> Observations: 32 #> Dependent Variable: mpg #> Type: OLS linear regression #> #> MODEL FIT: #> F(2,29) = 69.21, p = 0.00 #> R² = 0.83 #> Adj. R² = 0.81 #> #> Standard errors: OLS #> ------------------------------------------------ #> Est. S.E. t val. p #> ----------------- ------- ------ -------- ------ #> (Intercept) 37.23 1.60 23.28 0.00 #> hp -0.03 0.01 -3.52 0.00 #> wt -3.88 0.63 -6.13 0.00 #> ------------------------------------------------
It has several conveniences, like re-fitting your model with scaled
scale = TRUE). You have the option to leave the outcome
variable in its original scale (
transform.response = TRUE), which is
the default for scaled models. I’m a fan of Andrew Gelman’s 2 SD
standardization method, so you can specify by how many standard
deviations you would like to rescale (
n.sd = 2).
You can also get variance inflation factors (VIFs) and partial/semipartial (AKA part) correlations. Partial correlations are only available for OLS models. You may also substitute confidence intervals in place of standard errors and you can choose whether to show p values.
summ(fit, scale = TRUE, vifs = TRUE, part.corr = TRUE, confint = TRUE, pvals = FALSE)
#> MODEL INFO: #> Observations: 32 #> Dependent Variable: mpg #> Type: OLS linear regression #> #> MODEL FIT: #> F(2,29) = 69.21, p = 0.00 #> R² = 0.83 #> Adj. R² = 0.81 #> #> Standard errors: OLS #> ------------------------------------------------------------------------------ #> Est. 2.5% 97.5% t val. VIF partial.r part.r #> ----------------- ------- ------- ------- -------- ------ ----------- -------- #> (Intercept) 20.09 19.15 21.03 43.82 #> hp -2.18 -3.44 -0.91 -3.52 1.77 -0.55 -0.27 #> wt -3.79 -5.06 -2.53 -6.13 1.77 -0.75 -0.47 #> ------------------------------------------------------------------------------ #> #> Continuous predictors are mean-centered and scaled by 1 s.d.
Cluster-robust standard errors:
data("PetersenCL", package = "sandwich")fit2 <- lm(y ~ x, data = PetersenCL)summ(fit2, robust = "HC3", cluster = "firm")
#> MODEL INFO: #> Observations: 5000 #> Dependent Variable: y #> Type: OLS linear regression #> #> MODEL FIT: #> F(1,4998) = 1310.74, p = 0.00 #> R² = 0.21 #> Adj. R² = 0.21 #> #> Standard errors: Cluster-robust, type = HC3 #> ----------------------------------------------- #> Est. S.E. t val. p #> ----------------- ------ ------ -------- ------ #> (Intercept) 0.03 0.07 0.44 0.66 #> x 1.03 0.05 20.36 0.00 #> -----------------------------------------------
summary() is best-suited for interactive use.
When it comes to sharing results with others, you want sharper output
and probably graphics.
jtools has some options for that,
For tabular output,
export_summs() is an interface to the
huxreg() function that preserves the niceties of
particularly its facilities for robust standard errors and
standardization. It also concatenates multiple models into a single
fit <- lm(mpg ~ hp + wt, data = mtcars)fit_b <- lm(mpg ~ hp + wt + disp, data = mtcars)fit_c <- lm(mpg ~ hp + wt + disp + drat, data = mtcars)coef_names <- c("Horsepower" = "hp", "Weight (tons)" = "wt","Displacement" = "disp", "Rear axle ratio" = "drat","Constant" = "(Intercept)")export_summs(fit, fit_b, fit_c, scale = TRUE, transform.response = TRUE, coefs = coef_names)
Rear axle ratio
*** p < 0.001; ** p < 0.01; * p < 0.05.
In RMarkdown documents, using
export_summs() and the chunk option
results = 'asis' will give you nice-looking tables in HTML and PDF
output. Using the
to.word = TRUE argument will create a Microsoft Word
document with the table in it.
Another way to get a quick gist of your regression analysis is to plot
the values of the coefficients and their corresponding uncertainties
plot_summs() (or the closely related
plot_coefs()). Like with
export_summs(), you can still get your scaled models and robust
coef_names <- coef_names[1:4] # Dropping intercept for plotsplot_summs(fit, fit_b, fit_c, scale = TRUE, robust = "HC3", coefs = coef_names)
And since you get a
ggplot object in return, you can tweak and theme
as you wish.
Another way to visualize the uncertainty of your coefficients is via the
plot_summs(fit_c, scale = TRUE, robust = "HC3", coefs = coef_names, plot.distributions = TRUE)
These show the 95% interval width of a normal distribution for each estimate.
plot_coefs() works much the same way, but without support for
scale. This enables a wider range of
models that have support from the
broom package but not for
Sometimes the best way to understand your model is to look at the
predictions it generates. Rather than look at coefficients,
effect_plot() lets you plot predictions across values of a predictor
variable alongside the observed data.
effect_plot(fit_c, pred = hp, interval = TRUE, plot.points = TRUE)
And a new feature in version
2.0.0 lets you plot partial residuals
instead of the raw observed data, allowing you to assess model quality
after accounting for effects of control variables.
effect_plot(fit_c, pred = hp, interval = TRUE, partial.residuals = TRUE)
Categorical predictors, polynomial terms, (G)LM(M)s, weighted data, and much more are supported.
There are several other things that might interest you.
gscale(): Scale and/or mean-center data, including
center_mod(): Re-fit models with scaled and/or mean-centered data
pf_sv_test(), which are combined in
weights_tests(): Test the ignorability of sample weights in regression models
svycor(): Generate correlation matrices from
theme_apa(): A mostly APA-compliant
theme_nice(): A nice
ggplot2theme-changing convenience functions
make_predictions(): an easy way to generate hypothetical predicted data from your regression model for plotting or other purposes.
Details on the arguments can be accessed via the R documentation
?functionname). There are now vignettes documenting just about
everything you can do as well.
I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I prefer you use the Github issues system over trying to reach out to me in other ways. Pull requests for contributions are encouraged. If you are considering writing up a bug fix or new feature, please check out the contributing guidelines.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
The source code of this package is licensed under the MIT License.
lmerMod) models with
summ, the p values reported were one-tailed --- half their actual value. t statistics and standard errors were correct.
odds.ratioargument was given to
summ(), users were correctly warned that it is a deprecated argument but the exponentiated coefficients were not returned as they should have been.
effect_plot()when offsets are specified in a formula or a variable is included more than once in a formula.
partialize()handle missing data more gracefully, especially when the original data are a
%just%now sorts the matches on the left-hand side in the order they occur on the right-hand side.
md_table()) now rely on
panderto produce plain-text tables and use
"multiline"format by default. Check out
"grid"for another option. You can change the default using
stars(i.e., significance stars) are no longer available from
summ(). This is partially due to the change to printing tables via
panderbut also in keeping with statistical best practices.
predict_merMod(), which is used for generating confidence intervals for
merModmodel predictions in
effect_plot(), is now a user-accessible function.
msg_wrap()now interface with the
rlangpackage equivalents rather than the base
stop()and so on. End users may also take advantage of the
rlangsub-classing abilities through these functions.
summ()now passes extra arguments to
scale_mod(), allowing you to use those functions' more advanced options.
To reduce the complexity of this package and help people understand what they
are getting, I have removed all functions that directly analyze
interaction/moderation effects and put them into a new package,
There are still some functions in
jtools that support
some users may find that everything they ever used
jtools for has now moved
interactions. The following functions have moved to
Hopefully moving these items to a separate package called
help more people discover those functions and reduce confusion about what both
packages are for.
make_predictions()and removal of
jtools 1.0.0 release, I introduced
make_predictions() as a lower-level
way to emulate the functionality of
cat_plot(). This would return a list object with predicted data, the original
data, and a bunch of attributes containing information about how to plot it.
One could then take this object, with class
predictions, and use it as the
main argument to
plot_predictions(), which was another new function that
creates the plots you would see in
effect_plot() et al.
I have simplified
make_predictions() to be less specific to those plotting
functions and eliminated
plot_predictions(), which was ultimately too complex
to maintain and caused problems for separating the interaction tools into a
make_predictions() by default simply creates a new data frame
of predicted values along a
pred variable. It no longer accepts
mod2 arguments. Instead, it accepts an argument called
at where a user can
specify any number of variables and values to generate predictions at. This
syntax is designed to be similar to the
margins packages. See
the documentation for more info on this revised syntax.
make_new_data() is a new function that supports
make_predictions() by creating
the data frame of hypothetical values to which the predictions will be added.
I have added a new function,
partialize(), that creates partial residuals for
the purposes of plotting (e.g., with
effect_plot()). One negative when
visualizing predictions alongside original data with
effect_plot() or similar
tools is that the observed data may be too spread out to pick up on any
patterns. However, sometimes your model is controlling for the causes of this
scattering, especially with multilevel models that have random intercepts.
Partial residuals include the effects of all the controlled-for variables
and let you see how well your model performs with all of those things accounted
You can plot partial residuals instead of the observed data in
via the argument
partial.residuals = TRUE or get the data yourself using
partialize(). It is also integrated into
In keeping with the "tools" focus of this package, I am making available some
of the programming tools that previously had only been used internally inside
Many are familiar with how handy the
%in% operator is, but sometimes we want
everything except the values in some object. In other words, we might want
!(x %in% y) instead of
x %in% y. This is where
%nin% ("not in") acts as a
useful shortcut. Now, instead of
!(x %in% y), you can just use
x %nin% y.
Note that the actual implementation of
%nin% is slightly different to produce
the same results but more quickly for large data. You may run into some other
packages that also have a
%nin% function and they are, to my knowledge,
functionally the same.
One of my most common uses of both %in% and %nin% is when I want to subset
an object. For instance, assume
x is 1 through 5, y is 3 through 7, and I
want only the instances of
x that are not in
%nin%, I would write
x[x %nin% y], which leaves you with 1 and 2.
I really don't like having to write the object's name twice
in a row like that, so I created something to simplify further:
You can now subset
x to only the parts that are not in
y like this:
x %not% y. Conversely, you can do the equivalent of
x[x %in% y] using the
x %just% y.
As special cases for
%just%, if the left-hand side is a matrix
or data frame, it is assumed that the right hand side are column indices (if
numeric) or column names (if character). For example, if I do
mtcars %just% c("mpg", "qsec"), I get a data frame that is just the "mpg" and
"qsec" columns of
mtcars. It is an S3 method so support can be added for
additional object types by other developers.
An irritation when writing messages/warnings/errors to users is breaking up the
long strings without unwanted line breaks in the output. One problem is not
knowing how wide the user's console is.
wrap_str() takes any string and inserts
line breaks at whatever the "width" option is set to, which automatically
changes according to the actual width in RStudio and in some other setups.
This means you can write the error message in a single string across multiple,
perhaps indented, lines without those line breaks and indentations being part
of the console output.
wrap_str() wrappers (pun not intended) around
summ()no longer prints coefficient tables as data frames because this caused RStudio notebook users issues with the output not being printed to the console and having the notebook format them in less-than-ideal ways. The tables now have a markdown format that might remind you of Stata's coefficient tables.
md_table()and can be used by others if they want. It is based on
summ()no longer prints significance stars by default. This can be enabled with the
stars = TRUEargument or by setting the
TRUE(also available via
summ()has been removed.
get_colorsis now available to users. It retrieves the color palettes used in
jtoolsnow have a new theme, which you can use yourself, called
theme_nice(). The previous default,
theme_apa(), is still available but I don't like it as a default since I don't think the APA has defined the nicest-looking design guidelines for general use.
effect_plot()now can plot categorical predictors, picking up a functionality previously provided by
effect_plot()now uses tidy evaluation for the
predargument (#37). This means you can pass a variable that contains the name of
pred, which is most useful if you are creating a function, for loop, etc. If using a variable, put a
rlangpackage before it (e.g.,
pred = !! variable). For most users, these changes will not affect their usage.
make_predictions()(and by extension
effect_plot()and plotting functions in the
interactionspackage) now understands dependent variable transformations better. For instance, there shouldn't be issues if your response variable is
y. When returning the original data frame, these functions will append a transformed (e.g.,
log(y)) column as needed.
lme4has a bug when generating predictions in models with offsets --- it ignores them when the offset is specified via the
offset =argument. I have created a workaround for this.
This is a minor release.
plot_predictions()had an incorrect default value for
interval, causing an error if you used the default arguments with
make_predictions(). The default is now
effect_plot()would have errors when the models included covariates (not involved in the interaction, if any) that were non-numeric. That has been corrected. (#41)
FALSE) were not handled by the plotting functions appropriately, causing them to be treated as numeric. They are now preserved as logical. (#40).
sim_slopes()gave inaccurate results when factor moderators did not have treatment coding (
"contr.treatment") but are now recoded to treatment coding.
summ()output in RMarkdown documents is now powered by
kableExtra, which (in my opinion) offers more attractive HTML output and seems to have better luck with float placement in PDF documents. Your mileage may vary.
rmdformatsrather than the base
huxtable) will now have conditional namespace registration for users of R 3.6. This shouldn't have much effect on end users.
This release was initially intended to be a bugfix release, but enough other things came up to make it a minor release.
broomupdate when using
plot_coefs()arising from the latest update to
interact_plot()no longer errors if there are missing observations in the original data and quantiles are requested.
summ.merMod, the default p-value calculation is now via the Satterthwaite method if you have
lmerTestinstalled. The old default, Kenward-Roger, is used by request or when
pbkrtestis installed but not
lmerTest. It now calculates a different degrees of freedom for each predictor and also calculates a variance-covariance matrix for the model, meaning the standard errors are adjusted as well. It is not the default largely because the computation takes too long for too many models.
johnson_neyman()now allows you to specify your own critical t value if you are using some alternate method to calculate it.
johnson_neyman()now allows you to specify the range of moderator values you want to plot as well as setting a title.
sim_slopes()in a way similar to
modx.values = "plus-minus"). [#31]
plot_summs()now supports facetting the coefficients based on user-specified groupings. See
summ()variants now have pretty output in RMarkdown documents if you have the
huxtablepackage installed. This can be disabled with the chunk option
render = 'normal_print'.
pred.valuesin place of
predvals. Don't go running to change your code, though; those old argument names will still work, but these new ones are clearer and preferred in new code.
sim_slopesobjects. Just save your
sim_slopes()call to an object and call the
plot()function on that object to see what happens. Basically, it's
huxtableinstalled, you can now call
sim_slopes()object to get a publication-style table. The interface is comparable to
This release has several big changes embedded within, side projects that needed a lot of work to implement and required some user-facing changes. Overall these are improvements, but in some edge cases they could break old code. The following sections are divided by the affected functions. Some of the functions are discussed in more than one section.
These functions no longer re-fit the inputted model to center covariates, impose labels on factors, and so on. This generally has several key positives, including
lmmodels, 60% for
svyglm, and 80% for
merModin my testing). The speed gains increase as the models become more complicated and the source data become larger.
log) in the formula, the function would previously would have a lot of trouble and usually have errors. Now this is supported, provided you input the data used to fit the model via the
dataargument. You'll receive a warning if the function thinks this is needed to work right.
As noted, there is a new
data argument for these functions. You do not
normally need to use this if your model is fit with a
y ~ x + z type of
formula. But if you start doing things like
y ~ factor(x) + z, then
you need to provide the source data frame. Another benefit is that this
allows for fitting polynomials with
effect_plot() or even interactions with
interact_plot(). For instance, if my model was fit using
this kind of formula ---
y ~ poly(x, 2) + z --- I could then plot the
predicted curve with
effect_plot(fit, pred = x, data = data) substituting
fit with whatever my model is called and
data with whatever data frame
I used is called.
There are some possible drawbacks for these changes. One is that no longer are
factor predictors supported in
even two-level ones. This worked before by coercing
them to 0/1 continuous variables and re-fitting the model. Since the model is
no longer re-fit, this can't be done. To work around it, either transform the
predictor to numeric before fitting the model or use
two-level factor covariates are no longer centered and are simply
set to their reference value.
Robust confidence intervals: Plotting robust standard errors for compatible
models (tested on
glm). Just use the
robust argument like you would
Preliminary support for confidence intervals for
merMod models: You may
now get confidence intervals when using
merMod objects as input to the
plotting functions. Of importance, though, is the uncertainty is only for
the fixed effects. For now, a warning is printed. See the next section for
another option for
merMod confidence intervals.
Rug plots in the margins: So-called "rug" plots can be included in the
margins of the plots for any of these functions. These show tick marks for
each of the observed data points, giving a non-obtrusive impression of the
distribution of the
pred variable and (optionally) the dependent variable.
See the documentation for
effect_plot() and the
Facet by the
modx variable: Some prefer to visualize the predicted lines
on separate panes, so that is now an option available via the
argument. You can also use
plot.points with this, though the division into
groups is not straightforward is the moderator isn't a factor. See the
documentation for more on how that is done.
plot_predictions(): New tools for advanced plotting
To let users have some more flexibility,
jtools now lets users directly
access the (previously internal) functions that make
interact_plot() work. This should make it easier to tailor the
outputs for specific needs. Some features may be implemented for these functions
only to keep the
_plot functions from getting any more complicated than they
The simplest use of the two functions is to use
make_predictions() just like
cat_plot(). The difference is, of
make_predictions() only makes the data that would be used for
plotting. The resulting
predictions object has both the predicted and original
data as well as some attributes describing the arguments used. If you pass
this object to
plot_predictions() with no further arguments, it should do
exactly what the corresponding
_plot function would do. However, you might
want to do something entirely different using the predicted data which is part
of the reason these functions are separate.
One such feature specific to
make_predictions() is bootstrap confidence
You may no longer use these tools to scale the models. Use
the resulting object, and use that as your input to the functions if you want
All these tools have a new default
centered argument. They are now set to
centered = "all", but
"all" no longer means what it used to. Now it refers
to all variables not included in the interaction, including the dependent
variable. This means that in effect, the default option does the same thing
that previous versions did. But instead of having that occur when
centered = NULL, that's what
centered = "all" means. There is no
NULL option any longer. Note that with
sim_slopes(), the focal predictor
pred) will now be centered --- this only affects the conditional intercept.
This function now supports categorical (factor) moderators, though there is no option for Johnson-Neyman intervals in these cases. You can use the significance of the interaction term(s) for inference about whether the slopes differ at each level of the factor when the moderator is a factor.
You may now also pass arguments to
summ(), which is used internally to calculate
standard errors, p values, etc. This is particularly useful if you are using
merMod model for which the
pbkrtest-based p value calculation is too
The interface has been changed slightly, with the actual numbers always provided
data argument. There is no
x argument and instead a
to which you can provide variable names. The upshot is that it now fits much
better into a piping workflow.
The entire function has gotten an extensive reworking, which in some cases should result in significant speed gains. And if that's not enough, just know that the code was an absolute monstrosity before and now it's not.
There are two new functions that are wrappers around
center(), which call
gscale() but with
n.sd = 1 in the first case and
center.only = TRUE in the latter case.
Tired of specifying your preferred configuration every time you use
Now, many arguments will by default check your options so you can set your
own defaults. See
?set_summ_defaults for more info.
Rather than having separate
summ() function now uses
transform.response to collectively cover those
bases. Whether the response is centered or scaled depends on the
robust.type argument is deprecated. Now, provide the type of robust
estimator directly to
robust. For now, if
robust = TRUE, it defaults to
"HC3" with a warning. Better is to provide the argument directly, e.g.,
robust = "HC3".
robust = FALSE is still fine for using OLS/MLE standard
summ.merMod previously offered an
odds.ratio argument, that has been renamed to
exp (short for exponentiate)
to better express the quantity.
vifs now works when there are factor variables in the model.
One of the first bugs
summ() ever had occurred when the function was given
a rank-deficient model. It is not straightforward to detect, especially since
I need to make a space for an almost empty row in the outputted table. At long
last, this release can handle such models gracefully.
Like the rest of R, when
summ() rounded your output, items rounded exactly to
zero would be treated as, well, zero. But this can be misleading if the
original value was actually negative. For instance, if
digits = 2 and a
-0.003, the value printed to the console was
suggesting a zero or slightly positive value when in fact it was the
opposite. This is a limitation of the
trunc) function. I've
now changed it so the zero-rounded value retains its sign.
summ.merMod now calculates pseudo-R^2 much, much faster. For only modestly
complex models, the speed-up is roughly 50x faster. Because of how much faster
it now is and how much less frequently it throws errors or prints cryptic
messages, it is now calculated by default. The confidence interval calculation
is now "Wald" for these models (see
confint.merMod for details) rather than
"profile", which for many models can take a very long time and sometimes does
not work at all. This can be toggled with the
summ.svyglm now will calculate pseudo-R^2 for quasibinomial and
quasipoisson families using the value obtained from refitting them as
binomial/poisson. For now, I'm not touching AIC/BIC for such models
because the underlying theory is a bit different and the implementation
summ.lm now uses the t-distribution for finding critical values for
confidence intervals. Previously, a normal approximation was used.
summ.default method has been removed. It was becoming an absolute terror
to maintain and I doubted anyone found it useful. It's hard to provide the
value added for models of a type that I do not know (robust errors don't
always apply, scaling doesn't always work, model fit statistics may not make
sense, etc.). Bug me if this has really upset things for you.
One new model type has been supported:
rq models from the
Please feel free to provide feedback for the output and support of these models.
To better reflect the capabilities of these functions (not restricted to
objects), they have been renamed. The old names will continue to work to
preserve old code.
center.response now default to
reflect the fact that only OLS models can support transformations of the
dependent variable in that way.
There is a new
vars = argument for
scale_mod() that allows you to only apply
scaling to whichever variables are included in that character vector.
I've also implemented a neat technical fix that allows the updated model to itself be updated while not also including the actual raw data in the model call.
A variety of fixes and optimizations have been added to these functions.
Now, by default, there are two confidence intervals plotted, a thick line
representing (with default settings) the 90% interval and a thinner line
for the 95% intervals. You can set
NULL to get rid of
the thicker line.
plot_summs(), you can also set per-model
summ() arguments by providing
the argument as a vector (e.g.,
robust = c(TRUE, FALSE)). Length 1 arguments
are applied to all models.
plot_summs() will now also support models not
summ() by just passing those models to
summ() on them.
Another new option is
point.shape, similar to the model plotting functions.
This is most useful for when you are planning to distribute your output in
grayscale or to colorblind audiences (although the new default color scheme is
meant to be colorblind friendly, it is always best to use another visual cue).
The coolest is the new
plot.distributions argument, which if TRUE will plot
normal distributions to even better convey the uncertainty. Of course, you
should use this judiciously if your modeling or estimation approach doesn't
produce coefficient estimates that are asymptotically normally distributed.
Inspiration comes from https://twitter.com/BenJamesEdwards/status/979751070254747650.
broom's interface for Bayesian methods is inconsistent, so I've
hacked together a few tweaks to make
stanreg models work with
You'll also notice vertical gridlines on the plots, which I think/hope will
be useful. They are easily removable (see
drop_x_gridlines()) with ggplot2's
built-in theming options.
Changes here are not too major. Like
plot_summs(), you can now provide
unsupported model types to
export_summs() and they are just passed through
huxreg. You can also provide different arguments to
summ() on a per-model
basis in the way described under the
plot_summs() heading above.
There are some tweaks to the model info (provided by
glance). Most prominent
merMod models, for which there is now a separate N for each grouping
theme_apa()plus new functions
New arguments have been added to
remove.y.gridlines, both of which are
TRUE by default. APA hates giving
hard and fast rules, but the norm is that gridlines should be omitted unless
they are crucial for interpretation.
theme_apa() is also now a "complete"
theme, which means specifying further options via
theme will not revert
theme_apa()'s changes to the base theme.
Behind the scenes the helper functions
are used, which do what they sound like they do. To avoid using the arguments
to those functions, you can also use
drop_y_gridlines() which are wrappers around the
more general functions.
pf_sv_test() --- now handle missing data
in a more sensible and consistent way.
There is a new default qualitative palette, based on Color Universal Design
(designed to be readable by the colorblind) that looks great to all. There are
several other new palette choices as well. These are all documented at
crayon package as a backend, console output is now formatted
jtools functions for better readability on supported systems.
Feedback on this is welcome since this might look better or worse in
This release is limited to dealing with the
huxtable package's temporary
removal from CRAN, which in turn makes this package out of compliance with
CRAN policies regarding dependencies on non-CRAN packages.
Look out for
jtools 1.0.0 coming very soon!
sim_slopes()were both encountering errors with
merModinput. Thanks to Seongho Bae for reporting these issues and testing out development versions.
export_summs()had an extra space (e.g.,
( 1)) due to changes in
huxtable. The defaults are now just single numbers.
TRUE. It was reporting
alpha * 2in the legend, but now it is accurate again.
johnson_neyman()now handles multilevel models from
Jonas Kunst helpfully pointed out some odd behavior of
factor moderators. No longer should there be occasions in which you have two
different legends appear. The linetype and colors also should now be consistent
whether there is a second moderator or not. For continuous moderators, the
darkest line should also be a solid line and it is by default the highest
value of the moderator.
export_summs(), but that has been fixed.
cat_plot()by providing a vector of colors (any format that
ggplot2accepts) for the
summ()that formats the output in a way that lines up the decimal points. It looks great.
This may be the single biggest update yet. If you downloaded from CRAN, be sure to check the 0.8.1 update as well.
New features are organized by function.
control.fdroption is added to control the false discovery rate, building on new research. This makes the test more conservative but less likely to be a Type 1 error.
line.thicknessargument has been added after Heidi Jacobs pointed out that it cannot be changed after the fact.
sim_slopes()for 3-way interactions is much-improved.
alpha = .05the critical test statistic was always 1.96. Now, the residual degrees of freedom are used with the t distribution. You can do it the old way by setting
df = "normal"or any arbitrary number.
plot.points(see 0.8.1 for more). You can now plot observed data with 3-way interactions.
mod2valsspecification has been added:
"terciles". This splits the observed data into 3 equally sized groups and chooses as values the mean of each of those groups. This is especially good for skewed data and for second moderators.
linearity.checkoption for two-way interactions. This facets by each level of the moderator and lets you compare the fitted line with a loess smoothed line to ensure that the interaction effect is roughly linear at each level of the (continuous) moderator.
plot.points = TRUE.
jitterargument added for those using
plot.points. If you don't want the points jittered, you can set
jitter = 0. If you want more or less, you can play with the value until it looks right. This applies to
pbkrtestare slowing things down.
r.squaredis now set to FALSE by default.
plot_summs(): A graphic counterpart to
export_summs(), which was introduced in
the 0.8.0 release. This plots regression coefficients to help in visualizing
the uncertainty of each estimate and facilitates the plotting of nested models
alongside each other for comparison. This allows you to use
like robust standard errors and scaling with this type of plot that you could
otherwise create with some other packages.
plot_coefs(): Just like
plot_summs(), but no special
summ() features. This
allows you to use models unsupported by
summ(), however, and you can provide
summ() objects to plot the same model with different
summ() argument alongside
cat_plot(): This was a long time coming. It is a complementary function to
interact_plot(), but is designed to deal with interactions between
categorical variables. You can use bar plots, line plots, dot plots, and
box and whisker plots to do so. You can also use the function to plot the effect
of a single categorical predictor without an interaction.
Thanks to Kim Henry who reported a bug with
johnson_neyman() in the case that
there is an interval, but the entire interval is outside of the plotted area:
When that happened, the legend wrongly stated the plotted line was
Besides that bugfix, some new features:
johnson_neyman()fails to find the interval (because it doesn't exist), it no longer quits with an error. The output will just state the interval was not found and the plot will still be created.
interact_plot()has been added. Previously, if the moderator was a factor, you would get very nicely colored plotted points when using
plot.points = TRUE. But if the moderator was continuous, the points were just black and it wasn't very informative beyond examining the main effect of the focal predictor. With this update, the plotted points for continuous moderators are shaded along a gradient that matches the colors used for the predicted lines and confidence intervals.
Not many user-facing changes since 0.7.4, but major refactoring internally should speed things up and make future development smoother.
effect_plot()would trip up when one of the focal predictors had a name that was a subset of a covariate (e.g., pred = "var" but a covariate is called "var_2"). That's fixed.
merModobjects were not respecting the user-requested confidence level and that has been fixed.
merModobjects were throwing a spurious warning on R 3.4.2.
interact_plot()was mis-ordering secondary moderators. That has been fixed.
export_summs()had a major performance problem when providing extra arguments which may have also caused it to wrongly ignore some arguments. That has been fixed and it is much faster.
interact_plot()now gives more informative labels for secondary moderators when the user has defined the values but not the labels.
export_summs()for compatibility with huxtable 1.0.0 changes
summ(), the model was not mean-centered as the output stated. This has been fixed. I truly regret the error---double-check any analyses you may have run with this feature.
This function outputs regression models supported
summ() in table formats useful for RMarkdown output as well as specific
options for exporting to Microsoft Word files. This is particularly helpful for
those wanting an efficient way to export regressions that are standardized
and/or use robust standard errors.
The documentation for
j_summ() has been reorganized such that each supported
model type has its own, separate documentation.
?j_summ will now just give you
links to each supported model type.
j_summ() will from now on be referred to as, simply,
summ(). Your old code is fine;
j_summ() will now be an alias for
and will run the same underlying code. Documentation will refer to the
function, though. That includes the updated vignette.
One new feature for
part.corr = TRUEargument for a linear model, partial and semipartial correlations for each variable are reported.
More tweaks to
nlmer()) and, in the case of linear models, whether the
pbkrtestpackage is installed. If it is, p values are calculated based on the Kenward-Roger degrees of freedom calculation and printed. Otherwise, p values are not shown by default with
lmer()models. p values are shown with
glmer()models, since that is also the default behavior of
r.squaredoption, which for now is FALSE by default. It adds runtime since it must fit a null model for comparison and sometimes this also causes convergence issues.
Returning to CRAN!
A very strange bug on CRAN's servers was causing jtools updates to silently fail when I submitted updates; I'd get a confirmation that it passed all tests, but a LaTeX error related to an Indian journal I cited was torpedoing it before it reached CRAN servers.
The only change from 0.7.0 is fixing that problem, but if you're a CRAN user you will want to flip through the past several releases as well to see what you've missed.
j_summ()can now provide cluster-robust standard errors for lm models.
j_summ()output now gives info about missing observations for supported models.
center_lm()can standardize/center models with logged terms and other functions applied.
effect_plot()will now also support predictors that have functions applied to them.
j_summ()now supports confidence intervals at user-specified widths.
j_summ()now allows users to not display p-values if requested.
j_summ()output with merMod objects, since it provides p-values calculated on the basis of the estimated t-values. These are not to be interpreted in the same way that OLS and GLM p-values are, since with smaller samples mixed model t-values will give inflated Type I error rates.
j_summ()will not show p-values for
scale_lm()did not have its center argument implemented and did not explain the option well in its documentation.
johnson_neyman()got confused when a factor variable was given as a predictor
Bug fix release:
wgttest()acted in a way that might be unexpected when providing a weights variable name but no data argument. Now it should work as expected by getting the data frame from the model call.
gscale()had a few situations in which it choked on missing data, especially when weights were used. This in turn affected
center_lm(), which each rely on
gscale()for standardization and mean-centering. That's fixed now.
gscale()wasn't playing nicely with binary factors in survey designs, rendering the scaling incorrect. If you saw a warning, re-check your outputs after this update.
A lot of changes!
effect_plot(): If you like the visualization of moderation effects from
interact_plot(), then you should enjoy
effect_plot(). It is a clone of
interact_plot(), but shows a single regression line rather than several. It supports GLMs and lme4 models and can plot original, observed data points.
pf_sv_test(): Another tool for survey researchers to test whether it's okay to run unweighted regressions. Named after Pfeffermann and Sverchkov, who devised the test.
probe_interaction()does for the interaction functions,
weights_tests()will run the new
pf_sv_test()as well as
wgttest()simultaneously with a common set of arguments.
wgttest()now accepts and tests GLMs and may work for other regression models.
j_summ()would print significance stars based on the rounded p value, sometimes resulting in misleading output. Now significance stars are based on the non-rounded p values.
probe_interaction()did not pass an "alpha" argument to
sim_slopes(), possibly confusing users of
johnson_neyman(). The argument
sim_slopes()is looking for is called
"jnalpha". Now probe_interaction will pass
interact_plot()would stop on an error when the model included a two-level factor not involved in the interaction and not centered. Now those factors in that situation are treated like other factors.
interact_plot()sometimes gave misleading output when users manually defined moderator labels. It is now more consistent with the ordering the labels and values and will not wrongly label them when the values are provided in an odd order.
wgttest()now functions properly when a vector of weights is provided to the weights argument rather than a column name.
gscale()now works properly on tibbles, which requires a different style of column indexing than data frames.
center_lm()now work properly on models that were originally fit with tibbles in the data argument.
sim_slopes()would fail for certain weighted
lmobjects depending on the way the weights were specified in the function call. It should now work for all weighted
More goodies for users of
interact_plot(). It would work previously, but didn't use a weighted mean or SD in calculating values of the moderator(s) and for mean-centering other predictors. Now it does.
interact_plot(). Previously, factor variables had to be a moderator.
interact_plot()has only two unique values (e.g., dummy variables that have numeric class), by default only those two values have tick marks on the x-axis. Users may use the
pred.labelsargument to specify labels for those ticks.
set.offsetargument. By default it is 1 so that the y-axis represents a proportion.
Other feature changes:
sim_slopes()now supports weights (from the weights argument rather than a
svyglmmodel). Previously it used unweighted mean and standard deviation for non-survey models with weights.
robustargument was set to TRUE, the
robust.typeargument was not being passed (causing the default of "HC3" to be used). Now it is passing that argument correctly.
interact_plot(), providing an option to plot on original (nonlinear) scale.
interact_plot()can now plot fixed effects interactions from
j_summ()with R 3.4.x
j_summ(). Still needs convergence warnings, some other items.
wgttest()function, which runs a test to assess need for sampling weights in linear regression