The introduction of the 'broom' package has made converting model objects into data frames as simple as a single function. While the 'broom' package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. 'pixiedust' provides this functionality with a programming interface intended to be similar to 'ggplot2's system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX). With a little 'pixiedust' (and happy thoughts) tables can really fly.
After tidying up your analyses with the
broom package, go ahead and grab the
pixiedust. Customize your table output and write it to markdown, HTML, LaTeX, or even just the console.
pixiedust makes it easy to customize the appearance of your tables in all of these formats by adding any number of "sprinkles", much in the same way you can add layers to a
fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)library(pixiedust)dust(fit) %>%sprinkle(col = 2:4, round = 3) %>%sprinkle(col = 5, fn = quote(pvalString(value))) %>%sprinkle_colnames(term = "Term",estimate = "Estimate",std.error = "SE",statistic = "T-statistic",p.value = "P-value") %>%sprinkle_print_method("console")#> Term Estimate SE T-statistic P-value#> 1 (Intercept) 9.365 8.373 1.118 0.27#> 2 qsec 1.245 0.383 3.252 0.003#> 3 factor(am)1 3.151 1.941 1.624 0.12#> 4 wt -3.926 0.743 -5.286 < 0.001#> 5 factor(gear)4 -0.268 1.655 -0.162 0.87#> 6 factor(gear)5 -0.27 2.063 -0.131 0.9
Tables can be customized by row, column, or even by a single cell by adding sprinkles to the
dust object. The table below shows the currently planned and implemented sprinkles. In the "implemented" column, an 'x' indicates a customization that has been implemented, while a blank cell suggests that the customization is planned but has not yet been implemented. In the remaining columns, an 'x' indicates that the sprinkle is already implemented for the output format; an 'o' indicates that implementation is planned but not yet completed; and a blank cell indicates that the sprinkle will not be implemented (usually because the output format doesn't support the option).
To demonstrate, let's look at a simple linear model. We build the model and generate the standard summary.
fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)summary(fit)#>#> Call:#> lm(formula = mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)#>#> Residuals:#> Min 1Q Median 3Q Max#> -3.5064 -1.5220 -0.7517 1.3841 4.6345#>#> Coefficients:#> Estimate Std. Error t value Pr(>|t|)#> (Intercept) 9.3650 8.3730 1.118 0.27359#> qsec 1.2449 0.3828 3.252 0.00317 **#> factor(am)1 3.1505 1.9405 1.624 0.11654#> wt -3.9263 0.7428 -5.286 1.58e-05 ***#> factor(gear)4 -0.2682 1.6555 -0.162 0.87257#> factor(gear)5 -0.2697 2.0632 -0.131 0.89698#> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#>#> Residual standard error: 2.55 on 26 degrees of freedom#> Multiple R-squared: 0.8498, Adjusted R-squared: 0.8209#> F-statistic: 29.43 on 5 and 26 DF, p-value: 6.379e-10
While the summary is informative and useful, it is full of "stats-speak" and isn't necessarily in a format that is suitable for publication or submission to a client. The
broom package provides the summary in tidy format that, serendipitously, it a lot closer to what we would want for formal reports.
library(broom)tidy(fit)#> term estimate std.error statistic p.value#> 1 (Intercept) 9.3650443 8.3730161 1.1184792 2.735903e-01#> 2 qsec 1.2449212 0.3828479 3.2517387 3.168128e-03#> 3 factor(am)1 3.1505178 1.9405171 1.6235455 1.165367e-01#> 4 wt -3.9263022 0.7427562 -5.2861251 1.581735e-05#> 5 factor(gear)4 -0.2681630 1.6554617 -0.1619868 8.725685e-01#> 6 factor(gear)5 -0.2697468 2.0631829 -0.1307430 8.969850e-01
It has been observed by some, however, that even this summary isn't quite ready for publication. There are too many decimal places, the p-value employ scientific notation, and column titles like "statistic" don't specify what type of statistic. These kinds of details aren't the purview of
broom, however, as
broom is focused on tidying the results of a model for further analysis (particularly with respect to comparing slightly varying models).
pixiedust package diverts from
broom's mission here and provides the ability to customize the
broom output for presentation. The initial
dust object returns a table that is similar to the
library(pixiedust)dust(fit) %>%sprinkle_print_method("console")#> term estimate std.error statistic p.value#> 1 (Intercept) 9.3650443 8.3730161 1.1184792 0.2735903#> 2 qsec 1.2449212 0.3828479 3.2517387 0.0031681#> 3 factor(am)1 3.1505178 1.9405171 1.6235455 0.1165367#> 4 wt -3.9263022 0.7427562 -5.2861251 1.58e-05#> 5 factor(gear)4 -0.268163 1.6554617 -0.1619868 0.8725685#> 6 factor(gear)5 -0.2697468 2.0631829 -0.130743 0.896985
pixiedust shows its strength is the ease of which these tables can be customized. The code below rounds the columns
statistic to three decimal places each, and then formats the
p.value into a format that happens to be one that I like.
x <- dust(fit) %>%sprinkle(col = 2:4, round = 3) %>%sprinkle(col = 5, fn = quote(pvalString(value))) %>%sprinkle_print_method("console")x#> term estimate std.error statistic p.value#> 1 (Intercept) 9.365 8.373 1.118 0.27#> 2 qsec 1.245 0.383 3.252 0.003#> 3 factor(am)1 3.151 1.941 1.624 0.12#> 4 wt -3.926 0.743 -5.286 < 0.001#> 5 factor(gear)4 -0.268 1.655 -0.162 0.87#> 6 factor(gear)5 -0.27 2.063 -0.131 0.9
Now we're almost there! Let's change up the column names, and while we're add it, let's add some "bold" markers to the statistically significant terms in order to make them stand out some (I say "bold" because the console output doesn't show up in bold, but with the markdown tags for bold text. In a rendered table, the text would actually be rendered in bold).
x <- x %>%sprinkle(col = c("estimate", "p.value"),row = c(2, 4),bold = TRUE) %>%sprinkle_colnames(term = "Term",estimate = "Estimate",std.error = "SE",statistic = "T-statistic",p.value = "P-value") %>%sprinkle_print_method("console")x#> Term Estimate SE T-statistic P-value#> 1 (Intercept) 9.365 8.373 1.118 0.27#> 2 qsec **1.245** 0.383 3.252 **0.003**#> 3 factor(am)1 3.151 1.941 1.624 0.12#> 4 wt **-3.926** 0.743 -5.286 **< 0.001**#> 5 factor(gear)4 -0.268 1.655 -0.162 0.87#> 6 factor(gear)5 -0.27 2.063 -0.131 0.9
The markdown output from
pixiedust is somewhat limited due to the limitations of
Rmarkdown itself. If/when more features become available for
Rmarkdown output, I'll be sure to include them. But what can you do if you really want all of the flexibility of the HTML tables but need the MS Word document?
With a little help from the
Gmisc package, you can have the best of both worlds.
Gmisc isn't available on CRAN yet, but if you're willing to install it from GitHub, you can render a
docx file. Install
Then use in your YAML header
--- output: Gmisc::docx_document ---
When you knit your document, it knits as an HTML file, but I've had no problems with the rendering when I right-click the file and open with MS Word.
Read more at http://gforge.se/2014/07/fast-track-publishing-using-rmarkdown/ (but note that this blog post was written about the
Grmd package before it was moved into the
gazefunction to produce model summaries side-by-side (#80)
recycle = "none". The user must explicitly designate if recycling should be done over rows or columns.
caption_numbersprinkle, allowing numbering of tables to be turned off (#108)
fixed_headersprinkle. Allows HTML tables to have a fixed header over a scrollable body.
knit_printmethod to allow printing in Rmarkdown documents to operate more smoothly (#96).
border_collapseargument was changed to a character argument. This allows the full options available in HTML. The new default is
border_collapse = "collapse", which is the equivalent of
border_collapse = TRUE. Backward compatibility will be broken only if the
border_collapseargument was changed.
sprinkle(bg = "blue")may also be done via
sprinkle_bg(bg = 'blue'). Although this isn't a much of a change to the user, it makes infrastructure changes possible that will make the codebase easier to support.
discrete_colorssprinkles. (Issue #56)
gradient_colorssprinkles. (Issue #56)
get_dust_partto assist with generation of custom headers and footers (Issue #72)
pixiemapfor applying differing sprinkles across a
logical_rowsfor dynamically locating rows to sprinkle
ArgumentCheckand replace with
print_dust_html(Issue #57) to give the user control over the amount of white space following HTML tables.
replacesprinkle is now applied during printing. It had been applied in
sprinkle, which violated the philosophy of not changing the content of the data frame until the last possible moment.
font_colornow interpret "transparent" as a valid color. In HTML, it is interpreted as
"rgba(255,255,255,0)"; in LaTeX it is interpreted as
sanitize. Defaults to
FALSEand replaces automatic sanitization of text in LaTeX output via
Hmisc::latexTranslate. This is not backward compatible with 0.7.0, but is consistent with earlier versions of
pixiedust. You must opt in to sanitization now.
sanitize_args. Takes a list of arguments to pass to
Hmisc::latexTranslate, allowing sanitization to be extended to character sets defined by the user.
pixiedustdeals with colors has changed. If you are using custom defined colors in your LaTeX preamble, these will no longer work.
pixiedustwill only accept colors names in
colors(), or in the
#RRDDBBAAformats. This only affects LaTeX output, and provides a better interface for ensuring all HTML and LaTeX output are as similar as possible.
justifysprinkle to move the table to the left, or right side of the page. Defaults to centered.
knitr::opts_knit$get("rmarkdown.pandoc.to"). If this resolves to
NULL, the value of
docxas a valid print method, which is synonymous with
label = NULL. First, an attempt is made to generate a label from the chunk label, and if that fails, a label is generated from
getOption("digits")is used. This effectively prints as many decimal places as would be printed in the console.
dust.grouped_dfto give the option of ungrouping a grouped_df object, or splitting it.
bookdownattribute (and sprinkle) to allow use with the
bg_patternto "#FFFFFF#" and "#DDDDDD". The gray in this pattern is a little lighter and should do better when printed in black and white.
hhlineLaTeX package. This allows borders to be drawn over background colors. In the existing method, the cell borders are hidden by background colors. The hhline method can be used by setting
options(pixiedust_latex_hhline = TRUE).
tabrowsepelement was removed from the
dustobject since it apparently isn't a real thing.
tabrowsepelements to the dust object.
tablewidthallows the user to define cell width in terms of a percentage of the total expected table width. Not really recommended, but at least preserves some continuity between HTML and LaTeX output.
tabrowsepcontrol the distance between columns and rows in tables, but this feature isn't yet implemented.
font_familysprinkle for HTML output
knitr::asis_outputreturn. The motivation behind this was to be able to use the HTML code in shiny applications.
pixiedustno longer uses the
replacesprinkle to replace values in table columns, rows, or cells.
longtablesprinkle: allows tables to be printed in multiple sections.
glance_foot, which places model summary statistics in the foot of a table.
dustobject in a single line.
na_stringdefaults to "", and controls how
NAis printed in tables.
advancedMagicvignette, but a better example is really needed.
print.dustand made it a sprinkle. This allows it to be used without having to explicitly call
roundSafehelper function to allow rounding to succeed while skipping true character values.
print.dust. Not yet active, but lays the groundwork for multipage tables.
redustfunction for adding and/or switching table components. For example, adding a multirow header, or a foot.
replacesprinkle to replace values in table columns, rows, or cells. This closes Issue #12
objectelement from the
dustobject. In Issue #13, matthieugomez pointed out that very large models could create storage space problems. There's no sense in keeping an extra copy of the model object.
+.dustmethod and rewrote the sprinkles as pipable functions. This resolves Issue #8
pixiedustand sprinkling the dust around. It sounded like fun so let's hope CRAN lets me get away with it.
dust$obj. valign is not yet implemented.
col_namesattribute of the
dustobject and replaced it with the
headobject is a data frame holding the attributes of the table header.
objattributes of the
attribute is now named. The names are the original column names from thebroom` output.