Produce a data file containing the output of statistical models and assist with a workflow aimed at writing scientific papers using 'R Markdown'. Supported statistical functions include: t.test(), cor.test(), lm(), glm(), aov(), anova(), and several others. The package is based on tidy data principles and the 'tidyverse' (Wickham, 2017).

**Authors:** Willem Sleegers, Arnoud Plantinga

**License:** MIT

`tidystats`

is a package to easily create a text file containing the output of
statistical models. The goal of this package is to help researchers accompany
their manuscript with an organized data file of statistical results in order to
greatly improve the reliability of meta-research and to reduce statistical
reporting errors.

To make this possible, `tidystats`

relies on
tidy data principles to combine
the output of statistical analyses such as *t*-tests, correlations, ANOVAs, and
regression.

Besides enabling you to create an organized data file of statistical results,
the `tidystats`

package also contains functions to help you report statistics in
APA style. Results can be reported using
R Markdown or using a new built-in Shiny app. Additionally, development has started on a Google Docs plugin that uses a
tidystats data file to report statistics.

Please see below for instructions on how to install and use this package.
**Do note that the package is currently in development. This means the package
may contain bugs and is subject to significant changes.** If you find any bugs
or if you have any feedback, please let me know by creating an issue here on
Github (it's really easy to do!).

`tidystats`

can be installed from CRAN and the latest version can be installed
from Github using devtools.

`library(devtools)install_github("willemsleegers/tidystats")`

Load the package and start by creating an empty list to store the results of statistical models in.

`library(tidystats) results <- list()`

The main function is `add_stats()`

. The function has 2 necessary arguments:

`results`

: The list you want to add the statistical output to.`output`

: The output of a statistical test you want to add to the list (e.g., the output of`t.test()`

or`lm()`

)

Optionally you can also add an identifier, type, whether the analysis was
confirmatory or exploratory, and additional notes using the `identifier`

,
`type`

, `confirmatory`

, and `notes`

arguments, respectively.

The `identifier`

is used to identify the model
(e.g., 'weight_height_correlation'). If you do not provide one, one is
automatically created for you.

The `type`

argument is used to indicate whether the statistical test is a
hypothesis test, manipulation check, contrast analysis, or other kind of
analysis such as descriptives. This can be used to distinguish the vital
statistical tests from those less relevant.

The `confirmatory`

argument is used to indicate whether the test was
confirmatory or exploratory. It can also be ommitted.

The `notes`

argument is used to add additional information which you may find
fruitful. Some statistical tests have default `notes`

output (e.g., t-tests),
which will be overwritten when a `notes`

argument is supplied to the
`add_stats()`

function.

**Package:** stats

`t.test()`

`cor.test()`

`lm()`

`glm()`

`aov()`

`chisq.test()`

`wilcox.test()`

`fisher.test()`

**Package:** psych

`alpha()`

`corr.test()`

`ICC()`

**Package:** lme4 and lmerTest

`lmer()`

In the following example we perform several statistical tests on a data set, add the output of these results to a list, and save the results to a file.

The data set is called `cox`

and contains the data of a replication attempt of
C.R. Cox, J. Arndt, T. Pyszczynski, J. Greenberg, A. Abdollahi, and S. Solomon
(2008, JPSP, 94(4), Exp. 6) by Wissink et al. The replication study was part of
the Reproducibility Project (see https://osf.io/ezcuj/). The data set is part of
the `tidystats`

package.

`# Perform analysesM1_condition <- t.test(call_parent ~ condition, data = cox, paired = TRUE)M2_parent_siblings <- cor.test(cox$call_parent, cox$call_siblings, alternative = "greater")M3_condition_anxiety <- lm(call_parent ~ condition * anxiety , data = cox)M4_condition_sex <- aov(call_parent ~ condition * sex, data = cox) # Add resultsresults <- results %>% add_stats(M1_condition) %>% add_stats(M2_parent_siblings) %>% add_stats(M3_condition_anxiety) %>% add_stats(M4_condition_sex)`

To write the results to a file, use `write_stats()`

with the results list as the
first argument.

`write_stats(results, "data/results.csv")`

To see how the data was actually tidied, you can open the .csv file or you can convert the tidystats results list to a table, as shown below.

`library(dplyr)library(knitr)options(knitr.kable.NA = '-') results %>% stats_list_to_df() %>% select(-notes) %>% kable()`

identifier | group | term_nr | term | statistic | value | method |
---|---|---|---|---|---|---|

M1_condition | - | - | - | mean of the differences | -2.7700000 | Paired t-test |

M1_condition | - | - | - | t | -1.2614135 | Paired t-test |

M1_condition | - | - | - | df | 99.0000000 | Paired t-test |

M1_condition | - | - | - | p | 0.2101241 | Paired t-test |

M1_condition | - | - | - | 95% CI lower | -7.1272396 | Paired t-test |

M1_condition | - | - | - | 95% CI upper | 1.5872396 | Paired t-test |

M1_condition | - | - | - | null value | 0.0000000 | Paired t-test |

M2_parent_siblings | - | - | - | r | -0.0268794 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | t | -0.3783637 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | df | 198.0000000 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | p | 0.6472171 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | 95% CI lower | -0.1430882 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | 95% CI upper | 1.0000000 | Pearson's product-moment correlation |

M2_parent_siblings | - | - | - | null value | 0.0000000 | Pearson's product-moment correlation |

M3_condition_anxiety | coefficients | 1 | (Intercept) | b | 29.4466534 | Linear model |

M3_condition_anxiety | coefficients | 1 | (Intercept) | SE | 9.9311192 | Linear model |

M3_condition_anxiety | coefficients | 1 | (Intercept) | t | 2.9650891 | Linear model |

M3_condition_anxiety | coefficients | 1 | (Intercept) | p | 0.0034017 | Linear model |

M3_condition_anxiety | coefficients | 1 | (Intercept) | df | 196.0000000 | Linear model |

M3_condition_anxiety | coefficients | 2 | conditionmortality salience | b | 20.2945974 | Linear model |

M3_condition_anxiety | coefficients | 2 | conditionmortality salience | SE | 14.0193962 | Linear model |

M3_condition_anxiety | coefficients | 2 | conditionmortality salience | t | 1.4476085 | Linear model |

M3_condition_anxiety | coefficients | 2 | conditionmortality salience | p | 0.1493242 | Linear model |

M3_condition_anxiety | coefficients | 2 | conditionmortality salience | df | 196.0000000 | Linear model |

M3_condition_anxiety | coefficients | 3 | anxiety | b | -1.5511207 | Linear model |

M3_condition_anxiety | coefficients | 3 | anxiety | SE | 3.0119376 | Linear model |

M3_condition_anxiety | coefficients | 3 | anxiety | t | -0.5149910 | Linear model |

M3_condition_anxiety | coefficients | 3 | anxiety | p | 0.6071396 | Linear model |

M3_condition_anxiety | coefficients | 3 | anxiety | df | 196.0000000 | Linear model |

M3_condition_anxiety | coefficients | 4 | conditionmortality salience:anxiety | b | -5.5666889 | Linear model |

M3_condition_anxiety | coefficients | 4 | conditionmortality salience:anxiety | SE | 4.3104789 | Linear model |

M3_condition_anxiety | coefficients | 4 | conditionmortality salience:anxiety | t | -1.2914316 | Linear model |

M3_condition_anxiety | coefficients | 4 | conditionmortality salience:anxiety | p | 0.1980750 | Linear model |

M3_condition_anxiety | coefficients | 4 | conditionmortality salience:anxiety | df | 196.0000000 | Linear model |

M3_condition_anxiety | model | - | - | R squared | 0.0360246 | Linear model |

M3_condition_anxiety | model | - | - | adjusted R squared | 0.0212698 | Linear model |

M3_condition_anxiety | model | - | - | F | 2.4415618 | Linear model |

M3_condition_anxiety | model | - | - | numerator df | 3.0000000 | Linear model |

M3_condition_anxiety | model | - | - | denominator df | 196.0000000 | Linear model |

M3_condition_anxiety | model | - | - | p | 0.0655150 | Linear model |

M4_condition_sex | - | 1 | condition | df | 1.0000000 | Factorial ANOVA |

M4_condition_sex | - | 1 | condition | SS | 383.6450000 | Factorial ANOVA |

M4_condition_sex | - | 1 | condition | MS | 383.6450000 | Factorial ANOVA |

M4_condition_sex | - | 1 | condition | F | 1.7299360 | Factorial ANOVA |

M4_condition_sex | - | 1 | condition | p | 0.1899557 | Factorial ANOVA |

M4_condition_sex | - | 2 | sex | df | 1.0000000 | Factorial ANOVA |

M4_condition_sex | - | 2 | sex | SS | 1140.4861329 | Factorial ANOVA |

M4_condition_sex | - | 2 | sex | MS | 1140.4861329 | Factorial ANOVA |

M4_condition_sex | - | 2 | sex | F | 5.1426918 | Factorial ANOVA |

M4_condition_sex | - | 2 | sex | p | 0.0244352 | Factorial ANOVA |

M4_condition_sex | - | 3 | condition:sex | df | 1.0000000 | Factorial ANOVA |

M4_condition_sex | - | 3 | condition:sex | SS | 66.1529617 | Factorial ANOVA |

M4_condition_sex | - | 3 | condition:sex | MS | 66.1529617 | Factorial ANOVA |

M4_condition_sex | - | 3 | condition:sex | F | 0.2982976 | Factorial ANOVA |

M4_condition_sex | - | 3 | condition:sex | p | 0.5855728 | Factorial ANOVA |

M4_condition_sex | - | 4 | Residuals | df | 196.0000000 | Factorial ANOVA |

M4_condition_sex | - | 4 | Residuals | SS | 43466.5909054 | Factorial ANOVA |

M4_condition_sex | - | 4 | Residuals | MS | 221.7683209 | Factorial ANOVA |

There are two ways to report your results using tidystats: Using R Markdown or using a built-in Shiny app. In both cases, you need the tidystats list that contains the tidied output of your statistical tests.

If you have previously created a tidystats file, you can read in this file to
re-create the tidystats list, using the `read_stats()`

function.

`results <- read_stats("data/results.csv")`

If you do not want to use R Markdown, you can use the built-in Shiny app to
interactively produce APA-output and copy it to your manuscript. To start the
app, run the `inspect()`

function.

The `inspect()`

function takes the tidystats list as its first argument,
optionally followed by one or more identifiers. If no identifiers are provided,
all models will be displayed. The results of each model will be displayed in a
table and you can click on a row to produce APA output. This APA output will
appear in a textbox at the bottom, next to a copy button that can be pressed to
copy the results into your clipboard. See below for an example.

You can use the `report()`

function to report your results via R Markdown. This
function requires at minimum the tidystats list and an identifier identifying
the exact test you want to report. It may also be necessary to provide
additional information, such as a term in a regression, for the `report()`

function to figure out what you want to report.

To reduce repetition, you can use `options()`

to set the default tidystats list
to use. This way the `report()`

function requires one fewer argument.
You set the default tidystats list by running the following code:

`options(tidystats_list = results)`

To figure out how to report the output in APA style, the `report()`

function
uses the **method** information stored in the tidied model. For example, the
model with identifier 'M1' is a paired t-test. `report()`

will parse this,
see that it is part of the t-test family, and produce results accordingly.

Below is a list of common report examples:

code | output |
---|---|

`report("M1_condition")` |
t(99) = -1.26, p = .21, 95% CI [-7.13, 1.59] |

`report("M1_condition", statistic = "t")` |
-1.26 |

`report("M2_parent_siblings")` |
r(198) = -.027, p = .65 |

`report("M3_condition_anxiety", term = "conditionmortality salience")` |
b = 20.29, SE = 14.02, t(196) = 1.45, p = .15 |

`report("M3_condition_anxiety", term_nr = 2)` |
b = 20.29, SE = 14.02, t(196) = 1.45, p = .15 |

`report("M3_condition_anxiety", term = "(Model)")` |
adjusted R^{2} = .0035, F(1, 198) = 1.70, p = .19 |

`report("M4_condition_sex", term = "condition:sex")` |
F(1, 196) = 0.30, p = .59 |

As you can see in the examples above, you can use `report()`

to produce a full
line of output. You can also retrieve a single statistic by using the
`statistic`

argument. Additionally, you can refer to terms using either the
term label or the term number (and in some cases, using a group). Although it
may be less descriptive to use a term number, it reduces the amount of code
clutter in your Markdown document. Our philosophy is, in line with Markdown's
general writing philosophy, that the code should not distract from writing. To
illustrate, writing part of a results section with `tidystats`

will look
like this:
and the dental pain condition on the number of minutes allocated to calling
one's parents, `r report("M1_condition")`

.

To execute the code, the code segment should be surrounded by backward ticks (see http://rmarkdown.rstudio.com/lesson-4.html), which results in:

We found no significant difference between the mortality salience condition and the dental pain condition on the number of minutes allocated to calling one's parents,

t(99) = -1.26,p= .21, 95% CI [-7.13, 1.59].

Since it's common to also report descriptives in addition to the statistical
results, we have added a hopefully useful `describe_data()`

and `count_data()`

function to calculate common descriptive statistics that can be tidied and added
to a results data frame. Several examples follow using the `cox`

data.

`# Descriptives of the 'anxiety' variabledescribe_data(cox, anxiety)`

```
## # A tibble: 1 x 13
## var missing n M SD SE min max range median mode
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 anxi… 0 200 3.22 0.492 0.0348 1.38 4.38 3 3.25 3.5
## # ... with 2 more variables: skew <dbl>, kurtosis <dbl>
```

`# By conditioncox %>% group_by(condition) %>% describe_data(anxiety)`

```
## # A tibble: 2 x 14
## # Groups: condition [2]
## var condition missing n M SD SE min max range median
## <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 anxi… dental p… 0 100 3.26 0.497 0.0497 1.62 4.38 2.75 3.38
## 2 anxi… mortalit… 0 100 3.17 0.485 0.0485 1.38 4.38 3 3.25
## # ... with 3 more variables: mode <dbl>, skew <dbl>, kurtosis <dbl>
```

`# Descriptives of a non-numeric variablecount_data(cox, condition)`

```
## # A tibble: 2 x 4
## var group n pct
## <chr> <chr> <int> <dbl>
## 1 condition dental pain 100 50
## 2 condition mortality salience 100 50
```

If you use the `describe_data()`

and `count_data()`

function from the
`tidystats`

package to get the descriptives, you can use the
`tidy_describe_data()`

and `tidy_count_data()`

function to tidy the output, and
consequently add it to a results list.

(Note: This will soon be improved)

`anxiety_tidy <- cox %>% describe_data(anxiety) %>% tidy_describe_data() results <- results %>% add_stats(anxiety_tidy, type = "d", notes = "Anxious attachment style")`

```
## Warning in add_stats.data.frame(., anxiety_tidy, type = "d", notes =
## "Anxious attachment style"): You added a data.frame to your results list.
## Please make sure it is properly tidied.
```

- Changed the argument order in the family of
`add_stats()`

functions. Previously, the model output or tidy data frame was the first argument. This allowed you to directly pipe the model output into`add_stats()`

(using**magrittr**'s %>%). However, an alternative approach is to have the tidystats list to be the first argument. This allows you create a long sequence of pipes. You start with the results list, add a model via`add_stats()`

, pipe the result into the next`add_stats()`

, and so on. Since you often store your model output in variable names anyway, this is probably more convenient. Additionally, this probably also keeps your script more tidy (you can do this at the end of your data analysis script). - Certain statistical models are now tidied differently due to the addition of a 'group' column. Several models like multilevel models, meta-analytic models, and arguably also regression models have more than just terms (e.g., model fit), so to distinguish between coefficients and other parts of the output, a 'group' column has been added. This also means usage of the
`report()`

is affected, as now the group should be specified when necessary. Affected models are regression, within-subjects ANOVA, multilevel models, and meta-analysis models. - Added the
*class*argument to`add_stats()`

and`add_stats_to_model()`

. Rather than having to manually tidy the data first, you can make use of some custom`tidy_stats()`

functions by specifying the class argument. Run`?add_stats`

to see a list of supported classes and see the help document of`tidy_stats.confint()`

for an example. - Under the hood: Added a generic report function for single values called
`report_statistic()`

. Consequently, all report functions have been updated to use this new generic function. - Removed the
`identifier`

column from each list element when using`read_stats()`

. - Reordered the columns of
`tidy_stats.lm()`

and`tidy_stats.glm()`

to be consistent with the other`tidy_stats()`

functions.

- Added a new function called
`inspect()`

. This function accepts a tidystats results list or the output of a statistical model and will display all results in RStudio's Viewer pane. This allows the user to visually inspect the results and, importantly, copy results in APA style to their clipboard. This function is aimed at users who prefer not to use R Markdown or when you want to quickly run a model and get the results in APA-style. This new function works well with Microsoft Word, but does not work with Apple Pages (some of the styling is lost when copying the results). - Added support for
`glm()`

. - Added support for lme4's
`lmer()`

and lmerTest's`lmer()`

. - Added support for psych's
`alpha()`

. - Added support for psych's
`ICC()`

. - Added support for stats'
`confint()`

via the new`class`

argument in`add_stats()`

and`add_stats_to_model()`

.

- Added check for an existing identifier in
`add_stats_to_model()`

. - Added a
`class`

argument to`add_stats()`

and`add_stats_to_model()`

. Some statistical tests return a normal data.frame or matrix, which does not specify which test produced the results. This makes it difficult for tidystats to figure out how to tidy the result. Previously, we solved this by`add_stats()`

accepting pre-tidied data frames. Now we added a the`class`

argument to specify the name of the function that produced the results, so that we can then tidy it for you. - Added warnings in case unsupported output is added (e.g., a pre-tided data frame).
`read_stats()`

now removes empty columns from each list element.- Improved documentation.

- Fixed a bug that would incorrectly classify ANOVAs as One-way ANOVAs when character variables were used rather than factors.
- Prepared for
`dplyr`

0.8

- Added tests to the R package to minimize bugs.
- Made the code and documentation more consistent
- Added an under-the-hood helper function to rename statistics columns

- Renamed
`describe()`

to`describe_data()`

so that it no longer conflicts with**psych**'s`describe()`

. - Changed
`describe_data()`

to no longer accept non-numeric variables, but added the feature that descriptives can be calculated for more than 1 variable at a time. It is recommended to use the`count_data()`

function for non-numeric variables. - Renamed
`tidy_descriptives()`

to`tidy_describe_data()`

and improved the function. A notable change is that var information is now returned to identify which descriptives belong to which variable. Also changed the group delimiter to ' - '. `write_stats()`

now prettifies the numbers using`prettyNum()`

when saving them to disk.

- Improved
`report()`

function. The method now supports the option to retrieve a single statistic from any tidy stats data frame. This will allow you to report all statistics, even when reporting functions for a specific method are not yet supported. - Added quick report functions for means and standard deviations. Instead of using
`report()`

you can use`M()`

and`SD()`

to quickly report the mean or standard deviation, without having to specify that particular statistic. Less typing! - Added an option called 'tidystats_list' in
`options()`

to set a default list. By setting the tidystats list in`options()`

, you do not need to specify the list in the**results**argument of`report()`

. Less typing! - Reporting regression results will now include a check for whether confidence intervals are included, and report them.
- Added skewness and kurtosis to
`describe_data()`

- Added new
`count_data()`

function to calculate count descriptives of categorical data. Also added a`tidy_count_data()`

function to tidy the output of this new function. - Added support for
`chisq.test`

and`wilcox.test`

. - Added a better default
`identifier`

to`add_stats()`

. If you supply a variable to be added to the tidystats list, and no identifier is provided, it will take the variable name as the identifier. If you pipe the results into`add_stats()`

then the old default identifier will be used (e.g., "M1").

- Added identifier check to
`report()`

. The function will now throw an error when the identifier does not exist. - Added statistic check to all report functions. The function will now throw an error when the statistic does not exist.
- Improved
`report_p_value()`

to support multiple p-values. - Updated documentation to be more consistent and to take into account the changes made in the current update.

- Fixed bug that it was assumed that confidence intervals in
`htests`

were always 95% confidence intervals. - Fixed bug in report functions that would occur when no statistic argument was provided.
- Removed spaces from terms in
`aov()`

output. - Removed a leading space from the method information of a Two Sample t-test.
- Improved
`add_stats_to_model()`

. The method previously required a term and did not automatically complete information (e.g., method information).

- Initial release