The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.
The goal of scorecard
package is to make the development of the traditional credit risk scorecard model easier and efficient by providing functions for some common tasks that summarized in below. This package can also used in the development of machine learning models on binomial classification.
split_df
, one_hot
)var_filter
, iv
, vif
)woebin
, woebin_plot
, woebin_adj
, woebin_ply
)perf_eva
, perf_psi
)scorecard
, scorecard_ply
)gains_table
, report
)scorecard
from CRAN with:install.packages("scorecard")
scorecard
from github with:# install.packages("devtools")devtools::install_github("shichenxie/scorecard")
This is a basic example which shows you how to develop a common credit risk scorecard:
# Traditional Credit Scoring Using Logistic Regressionlibrary(scorecard)# data preparing ------# load germancredit datadata("germancredit")# filter variable via missing rate, iv, identical value ratedt_f = var_filter(germancredit, y="creditability")# breaking dt into train and testdt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)label_list = lapply(dt_list, function(x) x$creditability)# woe binning ------bins = woebin(dt_f, y="creditability")# woebin_plot(bins)# binning adjustment## adjust breaks interactively# breaks_adj = woebin_adj(dt_f, "creditability", bins)## or specify breaks manuallybreaks_adj = list(age.in.years=c(26, 35, 40),other.debtors.or.guarantors=c("none", "co-applicant%,%guarantor"))bins_adj = woebin(dt_f, y="creditability", breaks_list=breaks_adj)# converting train and test into woe valuesdt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins_adj))# glm ------m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)# vif(m1, merge_coef = TRUE) # summary(m1)# Select a formula-based model by AIC (or by LASSO for large dataset)m_step = step(m1, direction="both", trace = FALSE)m2 = eval(m_step$call)# vif(m2, merge_coef = TRUE) # summary(m2)# # Adjusting for oversampling (support.sas.com/kb/22/601.html)# library(data.table)# p1=0.03 # bad probability in population# r1=0.3 # bad probability in sample dataset# dt_woe = copy(dt_woe_list$train)[, weight := ifelse(creditability==1, p1/r1, (1-p1)/(1-r1) )][]# fmla = as.formula(paste("creditability ~", paste(names(coef(m2))[-1], collapse="+")))# m3 = glm(fmla, family = binomial(), data = dt_woe, weights = weight)# performance ks & roc ------## predicted proabilitypred_list = lapply(dt_woe_list, function(x) predict(m2, x, type='response'))## performanceperf = perf_eva(pred = pred_list, label = label_list)# score ------## scorecardcard = scorecard(bins_adj, m2)## credit scorescore_list = lapply(dt_list, function(x) scorecard_ply(x, card))## psiperf_psi(score = score_list, label = label_list)