Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.

CRAN_Status_Badge Travis build status

The goal of scorecard package is to make the development of traditional credit risk scorecard model easier and efficient by providing functions for some common tasks.

  • data partition (split_df)
  • variable selection (iv, var_filter)
  • weight of evidence (woe) binning (woebin, woebin_plot, woebin_adj, woebin_ply)
  • scorecard scaling (scorecard, scorecard_ply)
  • performance evaluation (perf_eva, perf_psi)


  • Install the release version of scorecard from CRAN with:
  • Install the latest version of scorecard from github with:
# install.packages("devtools")


This is a basic example which shows you how to develop a common credit risk scorecard:

# Traditional Credit Scoring Using Logistic Regression
# data prepare ------
# load germancredit data
# filter variable via missing rate, iv, identical value rate
dt_s = var_filter(germancredit, y="creditability")
# breaking dt into train and test
dt_list = split_df(dt_s, y="creditability", ratio = 0.6, seed = 30)
train = dt_list$train; test = dt_list$test;
# woe binning ------
bins = woebin(dt_s, y="creditability")
# woebin_plot(bins)
# binning adjustment
# # adjust breaks interactively
# breaks_adj = woebin_adj(dt_s, "creditability", bins) 
# # or specify breaks manually
breaks_adj = list(, 35, 40),
  other.debtors.or.guarantors=c("none", "co-applicant%,%guarantor"))
bins_adj = woebin(dt_s, y="creditability", breaks_list=breaks_adj)
# converting train and test into woe values
train_woe = woebin_ply(train, bins_adj)
test_woe = woebin_ply(test, bins_adj)
# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = train_woe)
# summary(m1)
# # Adjusting for oversampling (
# library(data.table)
# p1=0.03; r1=0.3
# dt_woe = dt_woe[, weight := ifelse(y==1, p1/r1, (1-p1)/(1-r1) )]
# fmla = as.formula(paste("y ~", paste(names(dt_woe)[2:21], collapse="+")))
# m1 = glm(fmla, family = binomial(), data = dt_woe, weights = weight)
# Select a formula-based model by AIC (or by LASSO)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
# summary(m2)
# performance ks & roc ------
# predicted proability
train_pred = predict(m2, train_woe, type='response')
test_pred = predict(m2, test_woe, type='response')
# performance
train_perf = perf_eva(train$creditability, train_pred, title = "train")
test_perf = perf_eva(test$creditability, test_pred, title = "test")
# score ------
card = scorecard(bins_adj, m2)
# credit score
train_score = scorecard_ply(train, card, print_step=0)
test_score = scorecard_ply(test, card, print_step=0)
# psi
  score = list(train = train_score, test = test_score),
  label = list(train = train$creditability, test = test$creditability)


scorecard 0.1.9

  • pdo in scorecard function now suports negative value. If pdo is positive, the larger score means the lower probability to be positive sample. If pdo is negative, the larger score means the higher probability to be positive sample.
  • fixed a bug in woebin function using chimerge method, which is caused by initial breaks have out-range values.
  • added a check function on the length of unique values in string columns, which might cause the binning process slow.
  • fixed a bug in perf_eva function which is caused by the nrow of plot is setted to 0 when the length of plot type is one.
  • the ratio argument in split_df function supports to set ratios for both train and test.
  • If the argument return_rm_reason is TRUE in var_filter function, the info_value, missing_rate and identical_rate are provided in the result.

scorecard 0.1.8

  • remove columns have only one unique values in input dataset
  • modify the default values of x_limits in perf_psi
  • fixed a bug in perf_psi when the label is factor
  • display proc time in woebin
  • fixed a bug in per_eva when estimating AUC
  • fixed a bug in woebin_adj when special_values is provided

scorecard 0.1.7

  • added chimerge method for woebin function
  • special_values option added in woebin function
  • f1 curve added in perf_eva

scorecard 0.1.6

  • Fixed a bug in woebin_adj function when all_var == FALSE and the breaks of all variables are perfect.
  • Provide parallel computation (foreach with parallel backend) in the functions of woebin and woebin_ply.
  • Modified scorecard_ply function.
  • Fixed a bug in woebin when there are empty bins based on provided break points.

scorecard 0.1.5

  • Fixed a bug in scorecard function when calculating the coefficients.
  • Fixed a bug in perf_eva when type="lift".
  • Fixed a bug in functions of woebin and var_filter when removing Date columns.

scorecard 0.1.4

  • perf_eva supports both predicted probability and score.
  • Added the woebin_adj function which can interactively adjust the binning info from woebin.
  • Reviewed woebin function.

scorecard 0.1.3

  • Modified the format of printing message and added condition functions.
  • Added the split_df function which split a dataframe into two.
  • Reorder the binning information. Move the missing to the first binning.

scorecard 0.1.2

  • fixed a bug in var_filter

scorecard 0.1.1

  • Specified some potential problems via conditions
  • Modified examples for most functions

scorecard 0.1.0

  • Initial version

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.2.2 by Shichen Xie, 3 days ago

Report a bug at

Browse source code at

Authors: Shichen Xie [aut, cre]

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports data.table, ggplot2, gridExtra, foreach, doParallel, parallel, openxlsx

Suggests knitr, rmarkdown, pkgdown, testthat

See at CRAN