# Summarize and Explore the Data

Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.

## Background

In a quality statistical data analysis the initial step has to be exploratory. Exploratory data analysis begins with the univariate exploratory analyis - examining the variable one at a time. Next comes bivariate analysis followed by multivariate analyis. SmartEDA package helps in getting the complete exploratory data analysis just by running the function instead of writing lengthy r code.

## Installation

The package can be installed directly from CRAN.

``````install.packages("SmartEDA")
``````

You can install SmartEDA from github with:

## Example

#### Data

In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Install the package "ISLR" to get the example data set.

``````install.packages("ISLR")
library("ISLR")
install.packages("SmartEDA")
library("SmartEDA")
## Load sample dataset from ISLR pacakge
Carseats= ISLR::Carseats
``````

## Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables

``````ExpData(data=Carseats,type=1)
ExpData(data=Carseats,type=2)
``````

## Summary of numerical variables

To summarise the numeric variables, you can use following r codes from this pacakge

``````ExpNumStat(Carseats,by="A",gp=NULL,Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)
ExpNumStat(Carseats,by="A",gp="Price",Qnt=seq(0,1,0.1),MesofShape=1,Outlier=TRUE,round=2)
ExpNumStat(Carseats,by="GA",gp="Urban",Qnt=seq(0,1,0.1),MesofShape=2,Outlier=TRUE,round=2)
``````

## Graphical representation of all numeric features

``````ExpNumViz(Carseats,gp=NULL,nlim=10,Page=c(2,2),sample=8)
ExpNumViz(Carseats,gp="Urban",type=1,nlim=NULL,fname=NULL,col=c("pink","yellow","orange"),Page=c(2,2),sample=8)
``````

## Summary of Categorical variables

``````ExpCTable(Carseats,Target=NULL,margin=1,clim=10,nlim=5,round=2,bin=NULL,per=T)
ExpCatViz(Carseats,target=NULL,fname=NULL,clim=10,margin=2,Page = c(2,1),sample=4)
ExpCTable(Carseats,Target="Price",margin=1,clim=10,nlim=NULL,round=2,bin=4,per=F)
ExpCatStat(Carseats,Target="Urban",result = "Stat",clim=10,nlim=5,Pclass="Yes")
ExpCatStat(Carseats,Target="Urban",result = "IV",clim=10,nlim=5,Pclass="Yes")
ExpCTable(Carseats,Target="Urban",margin=1,clim=10,nlim=NULL,round=2,bin=NULL,per=F)
``````

## Graphical representation of all categorical variables

``````ExpCatViz(Carseats,target="Urban",fname=NULL,clim=10,col=NULL,margin=2,Page = c(2,1),sample=2)
``````

## Create HTML EDA report

Create a exploratory data analysis report in HTML format

``````ExpReport(Carseats,Target="Urban",label=NULL,op_file="test.html",op_dir=getwd(),sc=2,sn=2,Rc="Yes")
``````

## Exploratory analysis - Custom tables, summary statistics

Descriptive summary on all input variables for each level/combination of group variable. Also while running the analysis we can filter row/cases of the data.

``````ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var'),filt="Urban=='Yes'")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum'),filt="Urban=='Yes' & Population>150")
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^ ^US=='Yes'"))
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','P0.25','median','p0.75'), gpby = TRUE,filt="Urban=='Yes'")
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','min','max'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
``````

# News

### SmartEDA 0.3.1

#### Enhancements

• Added 'bins', 'plot', 'round' and 'top' options to plot bar graph in `ExpCatStat`
• Added 'theme' option to customise the graph theme in both `ExpCatViz` and `ExpNumViz`
• Added 'gtitle' option to add additional chart title on both `ExpCatViz` and `ExpNumViz`
• Removed 'Label' option from `ExpCatStat`
• Changed input parameter name from 'gp' to 'target' in `ExpCatViz`

#### Bug fixes

• Fixed a formula issues on odds calculation in `ExpCatStat`

### SmartEDA 0.3.0

#### New Features

• Added function `ExpOutQQ` to plot Quantile-Quantile Plots for outlier checking
• Added function `ExpParcoord` for Parallel Co-ordinate plots

#### Enhancements

• Added "rdata","value" option to plot bars graph in `ExpCatViz`
• Added "dcast" option to reshape the data in `ExpCustomStat`
• Added "dcast","val" option to customise the summary statistics in `ExpNumStat`
• Added "Template" option to read rma in `ExpNumStat`

#### Bug fixes

• Fixed a bug in `ExpData`

### SmartEDA 0.2.0

#### New Features

• Added function `ExpCustomStat` to customise the summary statistics.
• Added both counts and percentages in `ExpData` under option Type=1 and removed DV option from the parameter list.

#### Bug fixes

• Fixed a bug in `ExpCatViz` function for not running the grid
• Fixed a bug in `ExpData` function, not running for some variable types.

### SmartEDA 0.1.0

• Added a `NEWS.md` file to track changes to the package.

# Reference manual

install.packages("SmartEDA")

0.3.2 by Dayanand Ubrangala, 5 months ago

https://daya6489.github.io/SmartEDA/

Report a bug at https://github.com/daya6489/SmartEDA/issues

Browse source code at https://github.com/cran/SmartEDA

Authors: Dayanand Ubrangala [aut, cre] , Kiran R [aut, ctb] , Ravi Prasad Kondapalli [aut, ctb] , Sayan Putatunda [aut, ctb]

Documentation:   PDF Manual