Joint Analysis and Imputation of Incomplete Data

Provides joint analysis and imputation of (generalized) linear and cumulative logit regression models, (generalized) linear and cumulative logit mixed models and parametric (Weibull) as well as Cox proportional hazards survival models with incomplete (covariate) data in the Bayesian framework. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <> with the help of the package 'rjags'. It also provides summary and plotting functions for the output and allows the user to export imputed values.

Travis-CI BuildStatus CRAN_Status_Badge Downloadcounter Rdoc

The package JointAI provides joint analysis and imputation of (generalized) linear regression models, (generalized) linear mixed models and parametric (Weibull) survival models with incomplete (covariate) data in the Bayesian framework.

The package performs some preprocessing of the data and creates a JAGS model, which will then automatically be passed to JAGS with the help of the R package rjags.

JointAI also provides summary and plotting functions for the output.


You can install JointAI from GitHub with:

# install.packages("devtools")

Main functions

Currently, there are the following main functions:

lm_imp()      # linear regression
glm_imp()     # generalized linear regression 
clm_imp()     # cumulative logit model
lme_imp()     # linear mixed model
glme_imp()    # generalized linear mixed model
clmm_imp()    # cumulative logit mixed model
survreg_imp() # parametric (Weibull) survival model
coxph_imp()   # Cox proportional hazards survival model

The functions lm_imp(), glm_imp() and clm_imp() use specification similar to their complete data counterparts lm() and glm() from base R and clm() from the package ordinal.

The functions for mixed models, lme_imp(), glme_imp() and clmm_imp() use similar specification as lme() from the package nlme (and clmm2() from ordinal).

survreg_imp() and coxph_imp() are missing data versions of survreg() and coxph() from the package survival.

Functions summary(), coef(), traceplot() and densityplot() provide a summary of the posterior distribution and its visualization.

GR_crit() and MC_error() provide the Gelman-Rubin diagnostic for convergence and the Monte Carlo error of the MCMC sample, respectively.

JointAI also provides functions for exploration of the distribution of the data and missing values, export of imputed values and prediction.

Minimal Example

Visualize the observed data and missing data pattern

par(mar = c(2.5, 3, 2.5, 1), mgp = c(2, 0.8, 0))
plot_all(NHANES[c(1, 5:6, 8:12)], fill = '#18bc9c', border = '#2C3E50', ncol = 4, nclass = 30)
md_pattern(NHANES, color = c('#2C3E50', '#18bc9c'))

Fit a linear regression model with incomplete covariates

lm1 <- lm_imp(SBP ~ gender + age + WC + alc + educ + bili,
              data = NHANES, n.iter = 500, = 'none')

Visualize the MCMC sample

traceplot(lm1, col = c('#E74C3C', '#2C3E50', '#18bc9c'), ncol = 4)
densplot(lm1, col = c('#E74C3C', '#2C3E50', '#18bc9c'), ncol = 4, lwd = 2)

Summarize the Result

#>  Linear model fitted with JointAI 
#> Call:
#> lm_imp(formula = SBP ~ gender + age + WC + alc + educ + bili, 
#>     data = NHANES, n.iter = 500, = "none")
#> Posterior summary:
#>                Mean     SD    2.5%   97.5% tail-prob. GR-crit
#> (Intercept)  88.089 8.8597  69.619 105.178    0.00000    1.01
#> genderfemale -3.566 2.2571  -7.950   0.803    0.11333    1.04
#> age           0.335 0.0700   0.193   0.469    0.00000    1.01
#> WC            0.226 0.0725   0.080   0.368    0.00267    1.00
#> alc>=1        6.350 2.3114   1.783  10.889    0.01200    1.00
#> educhigh     -2.828 2.0465  -6.797   1.157    0.17333    1.03
#> bili         -5.356 4.9196 -14.911   4.290    0.27867    1.04
#> Posterior summary of residual std. deviation:
#>           Mean    SD 2.5% 97.5% GR-crit
#> sigma_SBP 13.5 0.738 12.2  15.2       1
#> MCMC settings:
#> Iterations = 101:600
#> Sample size per chain = 500 
#> Thinning interval = 1 
#> Number of chains = 3 
#> Number of observations: 186
#>  (Intercept) genderfemale          age           WC       alc>=1 
#>   88.0889587   -3.5660647    0.3350489    0.2262964    6.3497173 
#>     educhigh         bili 
#>   -2.8283599   -5.3562879
#>                      2.5%       97.5%
#> (Intercept)   69.61859898 105.1784708
#> genderfemale  -7.95045888   0.8034015
#> age            0.19331277   0.4685157
#> WC             0.07998274   0.3681013
#> alc>=1         1.78289844  10.8888495
#> educhigh      -6.79742752   1.1568467
#> bili         -14.91144335   4.2900062
#> sigma_SBP     12.17745503  15.1533245


JointAI 0.5.1

Bug fixes

  • bug in ordinal models with only completely observed variables fixed (all necessary data is not passed to JAGS)
  • enable thinning when using parallel sampling
  • matrix Xl is no longer included in data_list when it is not used in the model
  • bugfix in subset when specified as vector
  • bugfix in ridge regression (gave an error message)
  • bugfix in recognition of binary factors that are coded as numeric and have missing values
  • bugfix in summary: range of iterations is printed correctly now when argument end is used
  • bugfix: error that occured in re-scaling when reference category was changed is solved
  • bugfix in survival models: coding of censoring variable fixed

Minor changes

  • summary() calls GR_crit() with argument autoburnin = FALSE unless specified otherwise via ...
  • when inits is specified as a function, the function is evaluated and the resulting list passed to JAGS (previously the function was passed to JAGS)
  • the example data simong and simWide have changed (more variables, less subjects)
  • added check if there are incomplete covariates before setting imp_pars = TRUE (when user specified via monitor_params or subset)
  • in survreg_imp the sign of the regression coefficient is now opposite to match the one from survreg

JointAI 0.5.0


  • the argument meth has changed to models

Bug fixes

  • add_samples(): bug that copied the last chain to all other chains fixed
  • bugfix for the order of columns in the matrix Xc, so that specification of functions of covariates in auxiliary variables works better
  • adding vertical lines to a densplot() issue (all plots showed all lines) fixed
  • nested functions involving powers made possible
  • typo causing issue in poisson glm and glme removed

Minor changes

  • plot_all(), densplot(), and traceplot() limit the number of plots on one page to 64 when rows and columns of the layout are not user specified (to avoid the 'figure margins too large' error)
  • change in longDF example data: new version containing complete and incomplete categorical longitudinal variables (and variable names L1 and L2 changed to c1 and c2)
  • Some minor changes in notes, warnings and error messages
  • The funciton list_impmodels() changed to list_models() (but list_impmodels() is kept as an alias for now)
  • improved handling of functional forms of covariates (also in longitudinal covariates and random effects)

New Features / Extensions

  • clm_imp() and clmm_imp(): new functions for analysis of ordinal (mixed) models
  • It is now possible to impute incomplete longitudinal covariates (continuous, binary and ordered factors).
  • coxph_imp(): new function to fit Cox proportional hazards models with incomplete (baseline) covariates
  • Argument no_model allows to specify names of completely observed variables for which no model should be specified (e.g., "time" in a mixed model)
  • Shrinkage: argument ridge = TRUE allows to use shrinkage priors on the precision of the regression coefficients in the analysis model
  • plot_all() can now handle variables from classes Date and POSIXt
  • new argument parallel allows different MCMC chains to be sampled in parallel
  • new argument ncores allows to specify the maximum number of cores to be used
  • new argument seed added for reproducible results; also a sampler ( and seed value for the sampler (.RNG.seed) are set or added to user-provided inital values (necessary for parallel sampling and reproducibility of results)
  • plot_imp_distr(): new function to plot distribution of observed and imputed values

JointAI 0.4.0

Bug fixes

  • RinvD is no longer selected to be monitored in random intercept model (RinvD is not used in such a model)
  • fixed various bugs for models in which only the intercept is used (no covariates)

Minor changes

  • summary(): reduced default number of digits
  • continuous variables with two distinct values are converted to factor
  • argument meth now uses default values if only specified for subset of incomplete variables
  • get_MIdat(): argument minspace added to ensure spacing of iterations selected as imputations
  • densplot(): accepts additional options, e.g., lwd, col, ...
  • list_models() replaces the function list_impmodels() (which is now an alias)


  • coef() method added for JointAI object and summary.JointAI object
  • confint() method added for JointAI object
  • print() method added for JointAI object
  • survreg_imp() added to perform analysis of parametric (Weibull) survival models
  • glme_imp() added to perform generalized linear mixed modeling
  • extended documentation; two new vignettes on MCMC parameters and functions for after the model is estimated; added messages about coding of ordinal variables

JointAI 0.3.4

Bug fixes

  • traceplot(), densplot(): specification of nrow AND ncol possible; fixed bug when only nrow specified

JointAI 0.3.3

Bug fixes

  • remove deprecated code specifying contrast.arg that now in some cases cause error
  • fixed problem identifying non-linear functions in formula when the name of another variable contains the function name

JointAI 0.3.2

Bug fixes

  • lme_imp(): fixed error in JAGS model when interaction between random slope variable and longitudinal variable

Minor changes

  • unused levels of factors are dropped

JointAI 0.3.1

Bug fixes

  • plot_all() uses correct level-2 %NA in title
  • simWide: case with no observed bmi values removed
  • traceplot(), densplot(): ncol and nrow now work with use_ggplot = TRUE
  • traceplot(), densplot(): error in specification of nrow fixed
  • densplot(): use of color fixed
  • functions with argument subset now return random effects covariance matrix correctly
  • summary() displays output with rowname when only one node is returned and fixed display of D matrix
  • GR_crit(): Literature reference corrected
  • predict(): prediction with varying factor fixed
  • no scaling for variables involved in a function to avoid problems with re-scaling

Minor changes

  • plot_all() uses xpd = TRUE when printing text for character variables
  • list_impmodels() uses linebreak when output of predictor variables exceeds getOption("width")
  • summary() now displays tail-probabilities for off-diagonal elements of D
  • added option to show/hide constant effects of auxiliary variables in plots
  • predict(): now also returns newdata extended with prediction

JointAI 0.3.0

Bug fixes

  • monitor_params is now checked to avoid problems when only part of the main parameters is selected
  • categorical imputation models now use min-max trick to prevent probabilities outside [0, 1]
  • initial value generation for logistic analysis model fixed
  • bugfix in re-ordering columns when a function is part of the linear predictor
  • bugfix in initial values for categorical covariates
  • bugfix in finding imputation method when function of variable is specified as auxiliary variable

Minor changes

  • md.pattern() now uses ggplot, which scales better than the previous version
  • lm_imp(), glm_imp() and lme_imp() now ask about overwriting a model file
  • analysis_main = T stays selected when other parameters are followed as well
  • get_MIdat(): argument include added to select if original data are included and id variable .id is added to the dataset
  • subset argument uses same logit as monitor_params argument
  • added switch to hide messages; distinction between messages and warnings
  • lm_imp(), glm_imp() and lme_imp() now take argument trunc in order to truncate the distribution of incomplete variables
  • summary() now omits auxiliary variables from the output
  • imp_par_list is now returned from JointAI models
  • cat_vars is no longer returned from lm_imp(), glm_imp() and lme_imp(), because it is contained in Mlist$refs


  • plot_all() function added
  • densplot() and traceplot() optional with ggplot
  • densplot() option to combine chains before plotting
  • example datasets NHANES, simLong and simWide added
  • list_impmodels to print information on the imputation models and hyperparameters
  • parameters() added to display the parameters to be/that were monitored
  • set_refcat() added to guide specification of reference categories
  • extension of possible functions of variables in model formula to (almost all) functions that are available in JAGS
  • added vignettes Minimal Example, Visualizing Incomplete Data, Parameter Selection and Model Specification

JointAI 0.2.0

Bug fixes

  • md_pattern(): does not generate duplicate plot any more
  • corrected names of imputation methods in help file
  • scaling when no continuous covariates are in the model or scaling is deselected fixed
  • initial value specification for coefficient for auxiliary variables fixed
  • get_MIdat(): imputed values are now filled in in the correct order
  • get_MIdat(): variables imputed with lognorm are now included when extracting an imputed dataset
  • get_MIdat(): imputed values of transformed variables are now included in imputed datasets
  • problem with non valid names of factor labels fixed
  • data matrix is now ordered according to order in user-specified meth argument

Minor changes

  • md.pattern(): adaptation to new version of md.pattern() from the mice package
  • internally change all NaN to NA
  • allow for scaling of incomplete covariates with quadratic effects
  • changed hyperparameter for precision in models with logit link from 4/9 to 0.001


  • gamma and beta imputation methods implemented

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.6.0 by Nicole S. Erler, 3 months ago

Report a bug at

Browse source code at

Authors: Nicole S. Erler [aut, cre]

Documentation:   PDF Manual  

Task views: Missing Data

GPL (>= 2) license

Imports MASS, mcmcse, coda, rlang, foreach, doParallel

Depends on rjags

Suggests knitr, rmarkdown, bookdown, foreign, ggplot2, ggpubr, testthat

System requirements: JAGS (

Enhanced by mdmb.

See at CRAN