Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'WeightIt', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with longitudinal treatments.
cobalt, which stands for Covariate Balance
Tables (and Plots).
cobalt allows users to assess balance on
covariate distributions in preprocessed groups generated through
weighting, matching, or subclassification, such as by using the
cobalt’s primary function is
stands for “balance table”, and essentially replaces (or supplements)
the balance assessment tools found in the R packages
Matching. To examine how
bal.tab() integrates with these
packages and others, see the help file for
which links to the methods used for each package. Each page has examples
bal.tab() is used with the package. There are also four
vignette detailing the use of
cobalt, which can be accessed with
browseVignettes("cobalt"): one for basic uses of
cobalt, one for the
cobalt with additional packages, one for the use of
with multiply imputed and/or clustered data, one for the use of
with longitudinal treatments. Currently,
cobalt is compatible with
designmatch, as well as data not processed through
Most of the major conditioning packages contain functions to assess
balance; so why use
cobalt at all?
cobalt arose out of several
desiderata when using these packages: to have standardized measures that
were consistent across all conditioning packages, to allow for
flexibility in the calculation and display of balance measures, and to
incorporate recent methodological recommendations in the assessment of
balance. In addition,
cobalt has unique plotting capabilities that
make use of
ggplot2 in R for balance assessment and reporting.
Because conditioning methods are spread across several packages which
each have their idiosyncrasies in how they report balance (if at all),
comparing the resulting balance from various conditioning methods can be
cobalt unites these packages by providing a single,
flexible tool that intelligently processes output from any of the
conditioning packages and provides the user with both useful defaults
and customizable options for display and calculation.
allows for balance assessment on data not generated through any of the
conditioning packages. In addition,
cobalt has tools for assessing and
reporting balance for clustered data sets, data sets generated through
multiple imputation, and data sets with a continuous treatment variable,
all features that exist in very limited capacities or not at all in
A large focus in devloping
cobalt was to streamline output so that
only the most useful, non-redundant, and complete information is
displayed, all at the user’s choice. Balance statistics are intuitive,
methodologically informed, and simple to interpret. Visual displays of
balance reflect the goals of balance assessment rather than being steps
removed. While other packages have focused their efforts on processing
cobalt only assesses balance, and does so particularly well.
New features are being added all the time, following the cutting edge of
methodolgocial work on balance assessment. As new packages and methods
cobalt will be ready to integrate them to further our
goal of simple, unified balance assessment.
Below are examples of
cobalt’s primary functions:
library("cobalt")library("MatchIt")data("lalonde", package = "cobalt")# Nearest neighbor matching with MatchItm.out <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,data = lalonde)# Checking balance before and after matching:bal.tab(m.out, m.threshold = 0.1, un = TRUE)
#> Call #> matchit(formula = treat ~ age + educ + race + married + nodegree + #> re74 + re75, data = lalonde) #> #> Balance Measures #> Type Diff.Un Diff.Adj M.Threshold #> distance Distance 1.7941 0.9739 #> age Contin. -0.3094 0.0718 Balanced, <0.1 #> educ Contin. 0.0550 -0.1290 Not Balanced, >0.1 #> race_black Binary 0.6404 0.3730 Not Balanced, >0.1 #> race_hispan Binary -0.0827 -0.1568 Not Balanced, >0.1 #> race_white Binary -0.5577 -0.2162 Not Balanced, >0.1 #> married Binary -0.3236 -0.0216 Balanced, <0.1 #> nodegree Binary 0.1114 0.0703 Balanced, <0.1 #> re74 Contin. -0.7211 -0.0505 Balanced, <0.1 #> re75 Contin. -0.2903 -0.0257 Balanced, <0.1 #> #> Balance tally for mean differences #> count #> Balanced, <0.1 5 #> Not Balanced, >0.1 4 #> #> Variable with the greatest mean difference #> Variable Diff.Adj M.Threshold #> race_black 0.373 Not Balanced, >0.1 #> #> Sample sizes #> Control Treated #> All 429 185 #> Matched 185 185 #> Unmatched 244 0
# Examining distributional balance with plots:bal.plot(m.out, var.name = "educ")bal.plot(m.out, var.name = "distance", mirror = TRUE, type = "histogram")
# Generating a Love plot to report balance:love.plot(bal.tab(m.out), threshold = 0.1, abs = TRUE, var.order = "unadjusted")
Please remember to cite this package when using it to analyze data. For
example, in a manuscript, write: “Matching was performed using Matching
(Sekhon, 2011), and covariate balance was assessed using cobalt
(Greifer, 2018) in R (R Core team, 2018).” Use
generate a bibliographic reference for the
cobaltNews and Updates
Several changes to
bal.tab() display options (i.e.,
disp.subclass, and parameters related to the display of balance tables with multinomial treatments, clusters, multiple imputations, and longitudinal treatments). First, the named arguments have been removed from the method-specific functions in order to clean them up and make it easier to add new functions, but they are still available to be specified. Second, a help page devoted just to these functions has been created, which can be accessed with
?options-display. Third, global options for these arguments can be set with
options() so they don't need to be typed each time. For example, if you wanted
un = TRUE all the time, you could set
options(cobalt_un = TRUE) once and not have to include it in the call to
disp.sds option to display standard deviations for each group in
bal.tab(). This works in all the same places
imp.fun options to request that only certain functions (e.g., mean or maximum) of the balance statistics are displayed in the summary across clusters/imputations. Previously this option was only available by call
print(). These parameters are part of the display options described above, so they are documented in
?options-display and not in the
bal.tab help files.
int_sep options to change the seperators between variable names when factor variables and interactions are displayed. This functionality had been available since version 3.4.0 but was not documented. It is now documented in the new
display_options help page.
binary can be specified with the global options
"cobalt_bin", respectively, so that a global setting (e.g., to set
binary = "std" to view standardizd mean difference rather than raw differences in proportion for binary variables) can be used instead of specifying the argument each time in the call to
Minor updates to
f.build() to process inputs more flexibly. The left hand side can now be empty, and the variables on the right hand side can now contain spaces.
Fixed a bug when logical treatments were used. Thanks to @victorn1.
Fixed a bug that would occur when a variable had only one value. Thanks to @victorn1.
Made it so the names of 0/1 and logical variables are not printed with
"_1" appended to them. Thanks to @victorn1 for the suggestion.
Major updates to the organization of the code and help files. Certain functions have simplified syntax, relying more on
..., and help pages have been shorted and consolidated for some methods. In particular, the code and help documents for the
designmatch methods of
bal.tab() have been consolidated since they all rely on exactly the same syntax.
Fixed a bug that would occur when
imabalanced.only = TRUE in
bal.tab() but all variables were balanced.
Fixed a bug where the mean of a binary variable would be displayed as 1 minus its mean.
Fixed a bug that would occur when missingness patterns were the same for multiple variables.
Fixed a bug that would occur when a distance measure was to be assessed with
bal.tab() and there were missing values in the covariates (thanks to Laura Helmkamp).
Fixed a bug that would occur when
estimand was supplied by the user when using the
default method of
Fixed a bug where non-standard variable names (like
"I(age^2)") would cause an error.
Fixed a bug where treatment levels that had different numbers of characters would yield an error.
disp.means option to
bal.tab with continuous treatments.
default method for
bal.tab so it can be used with specially formatted output from other packages (e.g., from
bal.plot should work with these outputs too. This, of course, will never be completely bug-free because infinite inputs are possible and cannot all be processed perfectly. Don't try to break this function :)
Fixed some bugs occuring when standardized mean differences are not finite, thanks to Noémie Kiefer.
Speed improvements in
bal.plot, especially with multiple facets, and in
Added new options to
bal.plot, including the ability to display histograms rather than densities and mirrored rather than overlapping plots. This makes it possible to make the popular mirrored histogram plot for propensity scores. In addition, it's now easier to change the colors of the components of the plots.
Made behavior around binary variables with interactions more like documentation, where interactions with both levels of the variable are present (thanks to @victorn1). Also, replaced
* as the delimiter between variable names in interactions. For the old behavior, use
int_sep = "_" in
Expanded the flexibility of
love.plot so that replacing the name of a variable will replace it everywhere it appears, including interactions. Thanks to @victorn1 for the suggestion.
var.names function to extract and save variable names from
bal.tab objects. This makes it a lot easier to create replacement names for use in
love.plot. Thanks to @victorn1 for the suggestion.
When weighted correlations are computed for continuous treatments, the denominator of the correlation now uses the unweighted standard deviations. See
?bal.tab for the rationale.
Added methods for objects from the
Added methods for
ps.cont objects from the
Fixed bugs resulting form changes to how formula inputs are handled.
Cleaned up some internal functions, also fixing some related bugs
subset option in all
bal.tab() methods (and consequently in
bal.plot()) that allows users to specify a subset of the data to assess balance on (i.e., instead of the whole data set). This provides a workaround for methods were the
cluster option isn't allowed (e.g., longitudinal treatments) but balance is desired on subsets of the data. However, in most cases,
which.cluster specified makes more sense.
Updated help files, in particular, more clearly documenting methods for
iptw objects from
CBMSM objects from
Added pretty printing with
crayon, inspired by Jacob Long's
abs option to
bal.tab to display absolute values of statistics, which can be especially helpful for aggregated output. This also affects how
love.plot() handles aggregated balance statistics.
Added support for data with missing covariates.
bal.tab() will produce balanace statistics for the non-missing values and will automatically create a new variable indicating whether the variable is missing or not and produce balance statistics on this variable as well.
Fixed a bug when displaying maximum imbalances with subclassification.
Fixed a bug where the unadjusted statistics were not displayed when using
love.plot() with subclasses. (Thanks to Megha Joshi.)
Add the ability to display individual subclass balance using
love.plot() with subclasses.
Under-the-hood changes to how
weightit objects are handled.
Objects in the environment are now handled better by
bal.tab() with the formula interface. The
data argument is now optional if all variables in the formula exist in the environment.
Fixed a bug when using
mnps objects from
twang with only one stop method.
Fixed a bug when using
twang objects that contained missing covariate values.
Fixed a bug when using
int = TRUE in
bal.tab() with few covariates.
Fixed a bug when variable names had special characters.
Added ablity to check higher order polynomials by setting
int to a number.
Changed behavior of
bal.tab() with multinomial treatments and
s.d.denom = "pooled" to use the pooled standard deviation from the entire sample, not just the paired treatments.
Restored some vignettes that required
Edits to vignettes and help files to respond to missing packages. Some vignette items may not display if packages are (temporarily) unavailable.
Fixed issue with sampling weights in
CBPS objects. (Thanks to @kkranker on Github.)
Added more support for sampling weights in
get.w() and help files.
Added support for longitudinal treatments in
love.plot(), including output from
Added a vignette to explain use with longitudinal treatments.
Edits to help files.
Added ability to change density options in
Added support for
Fixed bugs when limited variables were present. (One found and fixed by @sumtxt on Github.)
Fixed bug with multiple methods when weights were entered as a list.
Added full support for tibbles.
weightit methods in documentation and vignette now work.
Improved speed and performance.
pairwise option for
bal.tab() with multinomial treatments.
Increased flexibility for displaying balance using
love.plot() with clustered or multiply imputed data.
disp.bal.tab options to
Fixes to the vignettes. Also, creation of a new vignette to simplify the main one.
Added support for multinomial treatments in
bal.tab(), including output from
Added support for
weightit objects from
WeightIt, including for multinomial treatments.
Added support for
ebalance.trim objects from
Fixes to the vignette.
splitfactor() to handle tibbles better.
Fixed bug when using
bal.tab() with multiply imputed data without adjustment. Fixed bug when using
s.weights with the
formula method of
ks.threshold options to
bal.tab() to display Kolmogorov-Smirnov statistics before and after preprocessing.
Added support for sampling weights, which are applied to both control and treated units, using option
bal.tab(). Sampling weights are also now compatible with the sampling weights in
ps objects from
twang; the default is to apply the sampling weights before and after adjustment, mimicking the behavior of
Changed behavior of
ps objects to allow displaying balance for more than one stop method at a time, and to default to displaying balance for all available stop methods. The
full.stop.method argument in
bal.tab() has been renamed
full.stop.method still works.
ps objects has also gone through some changes to be more like
Added support in
bal.plot() for subclassification with continuous treatments.
Added support in
Fixed a bug in
love.plot() caused when
var.order was specified to be a sample that was not present.
Added support in
love.plot() for examining balance on multiple weight specifications at a time
Added new utilities
Added option in
bal.plot() to display points sized by weights when treatment and covariate are continuous
which = "both" option in
bal.plot() to simultaneously display plots for both adjusted and unadjusted samples; changed argument syntax to accommodate
bal.plot() to display balance for mutliple clusters and imputations simultaneously
bal.plot() to display balance for mutliple subclasses simultaneously with
love.plot() to ensure adjusted points are in front of unadjusted points; changed colors and shape defaults and allowable values
Fixed bug where
estimand were not functioning correctly in
weights can now be specified as lists of the usual arguments
Added support for matching using the
optmatch package or by specifying matching strata.
Added full support (
bal.plot()) for multiply imputed data, including for clustered data sets.
Added support for multiple distance measures, including special treatment in
Adjusted specifications in
love.plot() for color and shape of points, and added option to generate a line connecting the points.
love.plot() display to perform better on Windows.
Added capabilties for
bal.plot() to display plots for multiple groups at a time
Added flexibility to
bal.plot(), giving the capability to view multiple plots for subclassified or clustered data. Multinomial treatments are also supported.
Created a new vignette for clustered and multiply imputed data
Fixed a bug causing mislabelling of categorical variables
Changed calculation of weighted variance to be in line with recommendations;
CBPS can now be used with standardized weights
Added support for entropy balancing through the
Changed default color scheme of
love.plot() to be black and white and added options for color, shape, and size of points.
Added sample size calculations for continuous treatments.
Edits to the vignette.
Increased capabilities for cluster balance in
Increased information and decreased redundancy when assessing balance on interactions
quick option for
bal.tab() to increase speed
Added options for
Edits to the vignette
Added support for continuous treatment variables in
Added balance assessment within and across clusters
Other small performance changes to minimize errors and be more intuitive
Major revisions and adjustments to the vignette
Added a vignette.
Fixed error in bal.tab.Match that caused wrong values and and warning messages when used.
Added new capabilities to bal.plot, including the ability to view unadjusted sample distributions, categorical variables as such, and the distance measure. Also updated documentation to reflect these changes and make which.sub more focal.
Allowed subclasses to be different from simply 1:S by treating them like factors once input is numerical
Changed column names in Balance table output to fit more compactly, and updated documentation to reflect these changes.
Other small performance changes to minimize errors and be more intuitive.