Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.


CRAN_Status_Badge Travis build status

The goal of scorecard package is to make the development of the traditional credit risk scorecard model easier and efficient by providing functions for some common tasks that summarized in below. This package can also used in the development of machine learning models on binomial classification.

  • data preparation (split_df, one_hot)
  • variable selection (var_filter, iv, vif)
  • weight of evidence (woe) binning (woebin, woebin_plot, woebin_adj, woebin_ply)
  • performance evaluation (perf_eva, perf_psi)
  • scorecard scaling (scorecard, scorecard_ply)
  • scorecard report (gains_table, report)

Installation

  • Install the release version of scorecard from CRAN with:
install.packages("scorecard")
  • Install the latest version of scorecard from github with:
# install.packages("devtools")
devtools::install_github("shichenxie/scorecard")

Example

This is a basic example which shows you how to develop a common credit risk scorecard:

# Traditional Credit Scoring Using Logistic Regression
library(scorecard)
 
# data preparing ------
# load germancredit data
data("germancredit")
# filter variable via missing rate, iv, identical value rate
dt_f = var_filter(germancredit, y="creditability")
# breaking dt into train and test
dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$creditability)
 
# woe binning ------
bins = woebin(dt_f, y="creditability")
# woebin_plot(bins)
 
# binning adjustment
## adjust breaks interactively
# breaks_adj = woebin_adj(dt_f, "creditability", bins) 
## or specify breaks manually
breaks_adj = list(
  age.in.years=c(26, 35, 40),
  other.debtors.or.guarantors=c("none", "co-applicant%,%guarantor"))
bins_adj = woebin(dt_f, y="creditability", breaks_list=breaks_adj)
 
# converting train and test into woe values
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins_adj))
 
# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)
# vif(m1, merge_coef = TRUE) # summary(m1)
# Select a formula-based model by AIC (or by LASSO for large dataset)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
# vif(m2, merge_coef = TRUE) # summary(m2)
 
# # Adjusting for oversampling (support.sas.com/kb/22/601.html)
# library(data.table)
# p1=0.03 # bad probability in population 
# r1=0.3 # bad probability in sample dataset
# dt_woe = copy(dt_woe_list$train)[, weight := ifelse(creditability==1, p1/r1, (1-p1)/(1-r1) )][]
# fmla = as.formula(paste("creditability ~", paste(names(coef(m2))[-1], collapse="+")))
# m3 = glm(fmla, family = binomial(), data = dt_woe, weights = weight)
 
# performance ks & roc ------
## predicted proability
pred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))
## performance
perf = perf_eva(pred = pred_list, label = label_list)
 
# score ------
## scorecard
card = scorecard(bins_adj, m2)
## credit score
score_list = lapply(dt_list, function(x) scorecard_ply(x, card))
## psi
perf_psi(score = score_list, label = label_list)
 

News

scorecard 0.2.3

  • add var_skip argument in woebin function, and var_kp argument in scorecard_ply function. Therefore, the id column can be handle during the development of scorecard model.
  • fixed a typo in perf_eva function
  • replace !isFalse(x) with isTRUE(x) & !is.null(x) in perf_eva function. The isFalse function is only available after R3.5.

scorecard 0.2.2

  • fixed a bug in check_y function when the name of label column is 'y' in input data.
  • fixed a bug in woebin_adj function when count_distr_limit is not default value in woebin function.

scorecard 0.2.1

  • revised one_hot function
  • modified .export used in foreach loop
  • add my name in license file

scorecard 0.2.0

  • fixed a bug is woebin function cant modify positive values
  • pdo in scorecard function now supports negative value.
  • split_df will not remove datetime and identical variables
  • added a one-hot encoding function
  • added save_breaks_list argument in both woebin and woebin_adj function, which can save breaks_list as file in current working directory.
  • revised perf_eva and perf_psi functions
  • added a vif function
  • added a report function to create report for scorecard modeling
  • added a scorecard2 function, which donot requires a glm model object in inputs

scorecard 0.1.9

  • pdo in scorecard function now supports negative value. If pdo is positive, the larger score means the lower probability to be positive sample. If pdo is negative, the larger score means the higher probability to be positive sample.
  • fixed a bug in woebin function using chimerge method, which is caused by initial breaks have out-range values.
  • added a check function on the length of unique values in string columns, which might cause the binning process slow.
  • fixed a bug in perf_eva function which is caused by the nrow of plot is set to 0 when the length of plot type is one.
  • the ratio argument in split_df function supports to set ratios for both train and test.
  • If the argument return_rm_reason is TRUE in var_filter function, the info_value, missing_rate and identical_rate are provided in the result.

scorecard 0.1.8

  • remove columns have only one unique values in input dataset
  • modify the default values of x_limits in perf_psi
  • fixed a bug in perf_psi when the label is factor
  • display proc time in woebin
  • fixed a bug in per_eva when estimating AUC
  • fixed a bug in woebin_adj when special_values is provided

scorecard 0.1.7

  • added chimerge method for woebin function
  • special_values option added in woebin function
  • f1 curve added in perf_eva

scorecard 0.1.6

  • Fixed a bug in woebin_adj function when all_var == FALSE and the breaks of all variables are perfect.
  • Provide parallel computation (foreach with parallel backend) in the functions of woebin and woebin_ply.
  • Modified scorecard_ply function.
  • Fixed a bug in woebin when there are empty bins based on provided break points.

scorecard 0.1.5

  • Fixed a bug in scorecard function when calculating the coefficients.
  • Fixed a bug in perf_eva when type="lift".
  • Fixed a bug in functions of woebin and var_filter when removing Date columns.

scorecard 0.1.4

  • perf_eva supports both predicted probability and score.
  • Added the woebin_adj function which can interactively adjust the binning info from woebin.
  • Reviewed woebin function.

scorecard 0.1.3

  • Modified the format of printing message and added condition functions.
  • Added the split_df function which split a dataframe into two.
  • Reorder the binning information. Move the missing to the first binning.

scorecard 0.1.2

  • fixed a bug in var_filter

scorecard 0.1.1

  • Specified some potential problems via conditions
  • Modified examples for most functions

scorecard 0.1.0

  • Initial version

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("scorecard")

0.2.3 by Shichen Xie, a month ago


https://github.com/ShichenXie/scorecard


Report a bug at https://github.com/ShichenXie/scorecard/issues


Browse source code at https://github.com/cran/scorecard


Authors: Shichen Xie [aut, cre]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports data.table, ggplot2, gridExtra, foreach, doParallel, parallel, openxlsx

Suggests knitr, rmarkdown, pkgdown, testthat


See at CRAN