Generate Descriptive Statistics

Generate descriptive statistics such as measures of location, dispersion, frequency tables, cross tables, group summaries and multiple one/two way tables.


statistics

CRAN_Status_Badge cranchecks Travis-CI BuildStatus AppVeyor BuildStatus Coveragestatus

Installation

# Install release version from CRAN
install.packages("descriptr")
 
# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/descriptr")

Articles

Usage

We will use a modified version of the mtcars data set in the below examples. The only difference between the data sets is related to the variable types.

str(mtcarz)
#> 'data.frame':    32 obs. of  11 variables:
#>  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
#>  $ disp: num  160 160 108 258 360 ...
#>  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: num  16.5 17 18.6 19.4 17 ...
#>  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
#>  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
#>  $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
#>  $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

Continuous Data

Summary Statistics

ds_summary_stats(mtcarz, mpg)
#> ------------------------------ Variable: mpg ------------------------------
#> 
#>                         Univariate Analysis                          
#> 
#>  N                       32.00      Variance                36.32 
#>  Missing                  0.00      Std Deviation            6.03 
#>  Mean                    20.09      Range                   23.50 
#>  Median                  19.20      Interquartile Range      7.38 
#>  Mode                    10.40      Uncorrected SS       14042.31 
#>  Trimmed Mean            19.95      Corrected SS          1126.05 
#>  Skewness                 0.67      Coeff Variation         30.00 
#>  Kurtosis                -0.02      Std Error Mean           1.07 
#> 
#>                               Quantiles                               
#> 
#>               Quantile                            Value                
#> 
#>              Max                                  33.90                
#>              99%                                  33.44                
#>              95%                                  31.30                
#>              90%                                  30.09                
#>              Q3                                   22.80                
#>              Median                               19.20                
#>              Q1                                   15.43                
#>              10%                                  14.34                
#>              5%                                   12.00                
#>              1%                                   10.40                
#>              Min                                  10.40                
#> 
#>                             Extreme Values                            
#> 
#>                 Low                                High                
#> 
#>   Obs                        Value       Obs                        Value 
#>   15                         10.4        20                         33.9  
#>   16                         10.4        18                         32.4  
#>   24                         13.3        19                         30.4  
#>    7                         14.3        28                         30.4  
#>   17                         14.7        26                         27.3

Frequency Distribution

ds_freq_table(mtcarz, mpg)
#>                               Variable: mpg                               
#> |-----------------------------------------------------------------------|
#> |    Bins     | Frequency | Cum Frequency |   Percent    | Cum Percent  |
#> |-----------------------------------------------------------------------|
#> | 10.4 - 15.1 |     6     |       6       |    18.75     |    18.75     |
#> |-----------------------------------------------------------------------|
#> | 15.1 - 19.8 |    12     |      18       |     37.5     |    56.25     |
#> |-----------------------------------------------------------------------|
#> | 19.8 - 24.5 |     8     |      26       |      25      |    81.25     |
#> |-----------------------------------------------------------------------|
#> | 24.5 - 29.2 |     2     |      28       |     6.25     |     87.5     |
#> |-----------------------------------------------------------------------|
#> | 29.2 - 33.9 |     4     |      32       |     12.5     |     100      |
#> |-----------------------------------------------------------------------|
#> |    Total    |    32     |       -       |    100.00    |      -       |
#> |-----------------------------------------------------------------------|

Categorical Data

One Way Table

ds_freq_table(mtcarz, cyl)
#>                              Variable: cyl                              
#> -----------------------------------------------------------------------
#> Levels     Frequency    Cum Frequency       Percent        Cum Percent  
#> -----------------------------------------------------------------------
#>    4          11             11              34.38            34.38    
#> -----------------------------------------------------------------------
#>    6           7             18              21.88            56.25    
#> -----------------------------------------------------------------------
#>    8          14             32              43.75             100     
#> -----------------------------------------------------------------------
#>  Total        32              -             100.00              -      
#> -----------------------------------------------------------------------

Two Way Table

ds_cross_table(mtcarz, cyl, gear)
#>     Cell Contents
#>  |---------------|
#>  |     Frequency |
#>  |       Percent |
#>  |       Row Pct |
#>  |       Col Pct |
#>  |---------------|
#> 
#>  Total Observations:  32 
#> 
#> ----------------------------------------------------------------------------
#> |              |                           gear                            |
#> ----------------------------------------------------------------------------
#> |          cyl |            3 |            4 |            5 |    Row Total |
#> ----------------------------------------------------------------------------
#> |            4 |            1 |            8 |            2 |           11 |
#> |              |        0.031 |         0.25 |        0.062 |              |
#> |              |         0.09 |         0.73 |         0.18 |         0.34 |
#> |              |         0.07 |         0.67 |          0.4 |              |
#> ----------------------------------------------------------------------------
#> |            6 |            2 |            4 |            1 |            7 |
#> |              |        0.062 |        0.125 |        0.031 |              |
#> |              |         0.29 |         0.57 |         0.14 |         0.22 |
#> |              |         0.13 |         0.33 |          0.2 |              |
#> ----------------------------------------------------------------------------
#> |            8 |           12 |            0 |            2 |           14 |
#> |              |        0.375 |            0 |        0.062 |              |
#> |              |         0.86 |            0 |         0.14 |         0.44 |
#> |              |          0.8 |            0 |          0.4 |              |
#> ----------------------------------------------------------------------------
#> | Column Total |           15 |           12 |            5 |           32 |
#> |              |        0.468 |        0.375 |        0.155 |              |
#> ----------------------------------------------------------------------------

Group Summary

ds_group_summary(mtcarz, cyl, mpg)
#>                                        mpg by cyl                                         
#> -----------------------------------------------------------------------------------------
#> |     Statistic/Levels|                    4|                    6|                    8|
#> -----------------------------------------------------------------------------------------
#> |                  Obs|                   11|                    7|                   14|
#> |              Minimum|                 21.4|                 17.8|                 10.4|
#> |              Maximum|                 33.9|                 21.4|                 19.2|
#> |                 Mean|                26.66|                19.74|                 15.1|
#> |               Median|                   26|                 19.7|                 15.2|
#> |                 Mode|                 22.8|                   21|                 10.4|
#> |       Std. Deviation|                 4.51|                 1.45|                 2.56|
#> |             Variance|                20.34|                 2.11|                 6.55|
#> |             Skewness|                 0.35|                -0.26|                -0.46|
#> |             Kurtosis|                -1.43|                -1.83|                 0.33|
#> |       Uncorrected SS|              8023.83|              2741.14|              3277.34|
#> |         Corrected SS|               203.39|                12.68|                 85.2|
#> |      Coeff Variation|                16.91|                 7.36|                16.95|
#> |      Std. Error Mean|                 1.36|                 0.55|                 0.68|
#> |                Range|                 12.5|                  3.6|                  8.8|
#> |  Interquartile Range|                  7.6|                 2.35|                 1.85|
#> -----------------------------------------------------------------------------------------

Multiple Variable Statistics

ds_tidy_stats(mtcarz, mpg, disp, hp)
#> # A tibble: 3 x 16
#>   vars    min   max  mean t_mean median  mode range variance  stdev  skew
#>   <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>  <dbl> <dbl>
#> 1 disp   71.1 472   231.   228    196.  276.  401.   15361.  124.   0.420
#> 2 hp     52   335   147.   144.   123   110   283     4701.   68.6  0.799
#> 3 mpg    10.4  33.9  20.1   20.0   19.2  10.4  23.5     36.3   6.03 0.672
#> # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>,
#> #   q3 <dbl>, iqrange <dbl>

Features

  • Summary statistics
  • Two way tables
  • One way table
  • Group wise summary
  • Multiple variable statistics
  • Multiple one way tables
  • Multiple two way tables

Getting Help

If you encounter a bug, please file a minimal reproducible example using reprex on github. For questions and clarifications, use StackOverflow.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

News

descriptr 0.5.0

New Features

  • ds_tiy_stats() will detect continuous variables in the data set and return summary statistics for all of them. (#68)
  • ds_auto_summary_stats() will detect all continuous variables in the data set and return summary statistics and frequency tables for all of them. (#69)
  • ds_plot_*() family of functions will detect continuous and categorical variables in the data set and generate appropriate plots. (#70)
  • ds_launch_shiny_app() will check if suggested packages are available and offer to install the missing packages. (#76)
  • ds_measures_*() family of functions now accept multiple arguments. (#78)
  • ds_auto_freq_table() and ds_auto_cross_table() allow users to specify a subset of variables. (#83)
  • ds_freq_table() now works for both continuous and categorical data. (#90)
  • Some minor improvements to the error messages. (#51)

Deprecation

All the ds_* functions related to visualizing probability distributions have been soft deprecated and will be removed in the next release. Please use the vistributions package going forward.

The shiny app has been soft-deprecated and will be removed in the next release. Please use the xplorerr package going forward.

The following functions have been soft-deprecated and will be removed in the next release:

  • ds_freq_cont()
  • ds_oway_tables()
  • ds_tway_tables()

descriptr 0.4.1

New Features

  • Feature request: add turn off button in the app (#29)
  • Show frequency of missing values in frequency table (#32)
  • ds_freq_cont() should return a tibble (#34)

The following now return tibbles to facilitate further analysis:

  • ds_twoway_table()
  • ds_freq_table()
  • ds_freq_cont()
  • ds_group_summary()

New to descriptr are the following which generate summary statistics:

Bug Fixes

  • descriptr:::print_screen() fails if a variable has more than one class (#26)
  • ds_summary_stats() does not show missing values (#30)
  • Multiple variable statistics throws error in the presence of NA (#33)
  • Error in cross table in presence of NA (#40)

Acknowledgements

A big thanks to @GegznaV and @adam_medcalf who contributed code, opened issues and provided valuable feedback.

descriptr 0.4.0

New Features

We have completely revamped the API. All the functions now take a data.frame or tibble as the first argument followed by the variable names. The variable names need not be surrounded by single/double quotes anymore. Please view the guide for more details.

  • use ggplot for all plots (#8)

Bug Fixes

  • update shiny app (#16)
  • error in percentiles in ds_summary_stats (#22)

descriptr 0.3.0

New Features

  • shiny app for interactive analysis
  • multi column statistics (#6)

descriptr 0.2.0

This is a minor release containing bug fixes.

Bug Fixes

  • multiple one way table returns an error (#2)
  • ftable returned by freq_table must be data.frame or tibble (#3)
  • multiple one way table returns an error (#4)

descriptr 0.1.1

Bug Fixes

  • typo in group_summary(); maximum and minimum were reversed
  • norm_plot() was non-reactive to changes in standard deviation (#1).

descriptr 0.1.0

  • First release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.