Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.


In a quality statistical data analysis the initial step has to be exploratory. Exploratory data analysis begins with the univariate exploratory analyis - examining the variable one at a time. Next comes bivariate analysis followed by multivariate analyis. SmartEDA package helps in getting the complete exploratory data analysis just by running the function instead of writing lengthy r code.


The package can be installed directly from CRAN.


You can install SmartEDA from github with:

# install.packages("devtools")



In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Install the package "ISLR" to get the example data set.

## Load sample dataset from ISLR pacakge
Carseats= ISLR::Carseats

Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables


Summary of numerical variables

To summarise the numeric variables, you can use following r codes from this pacakge


Graphical representation of all numeric features


Summary of Categorical variables

ExpCatViz(Carseats,target=NULL,fname=NULL,clim=10,margin=2,Page = c(2,1),sample=4)
ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,Pclass="Yes")
ExpCatStat(Carseats,Target="Urban",result = "IV",clim=10,nlim=5,Pclass="Yes")

Graphical representation of all categorical variables

ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)

Create HTML EDA report

Create a exploratory data analysis report in HTML format


Exploratory analysis - Custom tables, summary statistics

Descriptive summary on all input variables for each level/combination of group variable. Also while running the analysis we can filter row/cases of the data.

ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var'),filt="Urban=='Yes'")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum'),filt="Urban=='Yes' & Population>150")
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^ ^US=='Yes'"))
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','P0.25','median','p0.75'), gpby = TRUE,filt="Urban=='Yes'")
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','min','max'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))


SmartEDA 0.3.1


  • Added 'bins', 'plot', 'round' and 'top' options to plot bar graph in ExpCatStat
  • Added 'theme' option to customise the graph theme in both ExpCatViz and ExpNumViz
  • Added 'gtitle' option to add additional chart title on both ExpCatViz and ExpNumViz
  • Removed 'Label' option from ExpCatStat
  • Changed input parameter name from 'gp' to 'target' in ExpCatViz

Bug fixes

  • Fixed a formula issues on odds calculation in ExpCatStat

SmartEDA 0.3.0

New Features

  • Added function ExpOutQQ to plot Quantile-Quantile Plots for outlier checking
  • Added function ExpParcoord for Parallel Co-ordinate plots


  • Added "rdata","value" option to plot bars graph in ExpCatViz
  • Added "dcast" option to reshape the data in ExpCustomStat
  • Added "dcast","val" option to customise the summary statistics in ExpNumStat
  • Added "Template" option to read rma in ExpNumStat

Bug fixes

  • Fixed a bug in ExpData

SmartEDA 0.2.0

New Features

  • Added function ExpCustomStat to customise the summary statistics.
  • Added both counts and percentages in ExpData under option Type=1 and removed DV option from the parameter list.

Bug fixes

  • Fixed a bug in ExpCatViz function for not running the grid
  • Fixed a bug in ExpData function, not running for some variable types.

SmartEDA 0.1.0

  • Added a file to track changes to the package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.3.1 by Dayanand Ubrangala, 19 days ago

Browse source code at

Authors: Dayanand Ubrangala; Kiran R; Ravi Prasad Kondapalli

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports ggplot2, sampling, scales, rmarkdown, ISLR, data.table, gridExtra, GGally

Suggests psych, Hmisc, smbinning, testthat, knitr, covr

See at CRAN