Tools to Quickly and Neatly Summarize Data

Data frame summaries, cross-tabulations, weight-enabled frequency tables and common univariate statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users.


News

Version 0.8.8

Fixed an issue with dfSummary() where reported percentages could exceed 100% under particular circumstances. Fixed issue with groups not being properly updated when used in Shiny apps. Fixed caracter encoding issues.

Version 0.8.7

Fixed an issue with dfSummary() when missing values were present with whole numbers. Fixed an issue with descr() where group was not shown for the first group when omitting headings.

Version 0.8.6

In dfSummary(), the Label column now has proper line breaks in its html version; also, in lists of values with frequencies, digits are omitted for integer variables and for numerics containing whole numbers only. Fixed an error when using the %>% pipe operator with dfSummary(). Fixed a calculation error for coefficient of variation (cv) in descr(). For ctable() html outputs, '<' and '>' are properly escaped when appearing in row or column names.

Version 0.8.5

Time intervals in dfSummary() now use lubridate::as.period(). In dfSummary(), line feeds in ascii barplots now displayed correctly, and string trimming is applied consistently. In ctable(), argument 'useNA' now correctly accepts value "no".

Version 0.8.4

Method for calculating number of bins in dfSummary() histograms changed from nclass.FD() to nclass.Sturges(). Removed extra space in dfSummary() with time objects. Allowed one more value for frequency counts in dfSummary() when "[1 other value]" was displayed; this actual value is now displayed instead.

Version 0.8.3

Introduced global options with st_options(). In all functions, argument 'split.table' becomes 'split.tables' as per changes in the 'pander' package. Argument 'omit.headings' added to all main functions. New logical options for freq(): 'totals' and 'display.nas'. In descr(), Q1 and Q3 were added; also, the order in the 'stats' argument is now retained in the output table. In dfSummary(): number alignment improvement, fixed frequencies not appearing when value 0 was present, fixed arguments 'valid.col' and 'na.col' being unresponsive, and fixed warnings when generating graphics columns. Also in dfSummary(), the 'graph.magnif' argument now allows shrinking or enlarging graphs. For the print() and view() functions, argument 'html.table.class' has changed to 'table.classes' and its usage is simplified (please refer to docs for details). CSS class 'st-small' can be used to make html tables smaller by slightly reducing font size and cell padding.

Version 0.8.2

Fixed performance issue with numerical data having a large range. Fixed missing linefeeds in dfSummary text barcharts. Added support for Date / POSIXt objects. Improved support for lapply() when used with freq().

Version 0.8.1

Added vignettes. Improved handling of negative column indexing by the parsing function. Fixed issue with all-NA factors in dfSummary(). Changed column for displaying "All NA's" for numerical vectors in dfSummary. Fixed issue with graphs in dfSummary. Removed accentuated characters from docs.

Version 0.8.0

Introducing graphs in dfSummary results. Improved alignment of numbers in both html and ascii tables. Cleanup in css and upgrade to Bootstrap 4 beta. Added rudimentary support for lapply() to be used with freq(), and improved support for by() and with() with descr(), freq(), and ctable(). Some backward compatibility notes: in dfSummary(), parameter name 'display.labels' now corresponds to 'labels.col', for consistency reasons. Also, see Notes for Version 0.6.9 about the 'file' parameter.

Version 0.7.0

(Released on GitHub only). Changes introduced in 0.6.9 have been further tested and enhanced. Improved alignment in cells having counts + proportions. Updated the vignette to reflect latest changes and added examples using the included datasets ('exams' and 'tobacco'). dfSummary()'s last column now includes counts and percentages of both valid and missing data. Internal changes: coding style has been reviewed to be more standard; Roxygen2 is now used to generate help files.

Version 0.6.9

(Released on GitHub only). Introduced ctable() for cross-tabulations. Added extended support for printing objects created using by() and/or with() - variable names, labels and by-groups are now displayed correctly. view() is now more than just a wrapper function build around print.summarytools(); it is the function to use when printing an object created using by(). Appending to summarytools-generated html files is now supported. Most pander options stored in summarytools objects can now be overridden when using print() or view(). freq() has an new parameter, 'order', allowing to order rows by count rather than values sorted alphabetically. Alignment of numbers in descr() observations table has been improved. An important change causing a minor break in backward compatibility: the file= parameter must now be used with print() or view(); its use with other functions is now deprecated. Environment variables keep track of all temporary html files created during session, and also make it possible to print lists of summarytools objects created using by().

Version 0.6.5

Fixed a problem with dfSummary() which arose when number of factors exceeded max.distinct.values. Improved the way dfSummary() reports frequencies on character variables. Fixed problems with outputs when using weights. Added hashtags to table headings to improve markdown integration. Added an option to generic print() function to suppress the footer in HTML outputs.

Version 0.6

Added Introduction vignette. Fixed markdown output that would not render strings such as . Improved multiline tables linefeeds. Introduced cleartmp() function to delete temporary HTML files used with print method "viewer" or "browser". Modifications to sample datasets. Bootstrap stylesheets now only include table-specific elements. Changed the way method="browser" sends file path to browser for better cross-platform compatibility. Improved results when using by(). Since version 0.5 was only made available through devtools::install_github(), please see changes for that version if updating from CRAN.

Version 0.5

Function descr() now also supports weights. Output from what.is() has been simplified. Other changes are transparent to the user (but make the internals more consistent across functions).

Version 0.4

Added cat() functions to fully support knitr's document generation. Also added sample datasets so that users can experiment using summarytools functions with them. freq() now supports weights.

Version 0.3

Another round of major changes.

  • Bringing in HTML table built with htmltools and viewable in RStudio's Viewer
  • function desc is renamed descr (mainly to avoid conflict with plyr's desc)
  • argument "echo" is deprecated; either display with pander or use as.table()
  • Returned objects are now of class "summarytools" and have several attributes that are used by print.summarytools
    • st.type : one of "freq", "descr" and "dfSummary"
    • date : date at which the function was called
    • var.name & var.label : for 'freq', and also 'desc' when a single vector is used.
    • pander.args : 'style', 'justify', 'plain.ascii', 'split.table'
  • print.summarytools has argument "method" that can be one of "pander", "viewer", or "browser", the last two being used to display an HTML version of the output, using bootstrap's css (getbootstrap.com)
  • rows indexing is "detected" and reported (function .parse.arg.x takes care of this)
  • rounding now only occurs at the printing stage

Version 0.2

Several major changes since version 0.1

  • 'unistats' is now called 'desc'.
  • 'frequencies' is now called 'freq'.
  • 'properties' is now called 'prop'.
  • shortcuts have been added to keep backward-compatibility.
  • 'desc' now accepts dataframes as first argument; factors and character columns will be ignored.
  • 'desc' can be transposed to suit one's preferences.
  • 'freq' just returns a matrix-table, not a list anymore.
  • in 'desc' and 'freq', no more argument 'display.label'. Those are displayed automatically when present.
  • rapportools is used instead of Hmisc for variable labels
  • function 'properties' was removed for now. Maybe reintegrated in a future update.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("summarytools")

0.8.8 by Dominic Comtois, 2 months ago


https://github.com/dcomtois/summarytools


Report a bug at https://github.com/dcomtois/summarytools/issues


Browse source code at https://github.com/cran/summarytools


Authors: Dominic Comtois


Documentation:   PDF Manual  


GPL-2 license


Imports grDevices, htmltools, lubridate, matrixStats, methods, pander, pryr, rapportools, RCurl, utils

Suggests rstudioapi, knitr, rmarkdown


Imported by radiant.data.


See at CRAN