Last updated on 2020-11-30 by Julie Josse, Nicholas Tierney, Nathalie Vialaneix (r-miss-tastic team)
Missing data are very frequently found in datasets. Base R provides a few options to handle them using computations that involve only observed data (na.rm = TRUE
in functions mean
, var
, ... or use = complete.obs|na.or.complete|pairwise.complete.obs
in functions cov
, cor
, ...). The base package stats also contains the generic function na.action
that extracts information of the NA
action used to create an object.
These basic options are complemented by many packages on CRAN, which we structure into main topics:
In addition to the present task view, this reference website on missing data might also be helpful.
If you think that we missed some important packages in this list, please contact the maintainer.
ampute
of mice.em.norm
for multivariate Gaussian data), in cat (function em.cat
for multivariate categorical data), in mix (function em.mix
for multivariate mixed categorical and continuous data). These packages also implement Bayesian approaches (with Imputation and Posterior steps) for the same models (functions da.
XXX for norm
, cat
and mix
) and can be used to obtain imputed complete datasets or multiple imputations (functions imp.
XXX for norm
, cat
and mix
), once the model parameters have been estimated. imputeR is a Multivariate Expectation-Maximization (EM) based imputation framework that offers several different algorithms, including Lasso, tree-based models or PCA. In addition, TestDataImputation implements imputation based on EM estimation (and other simpler imputation methods) that are well suited for dichotomous and polytomous tests with item responses.hotdeck
). StatMatch uses hot-deck imputation to impute surveys from an external dataset. impimp also uses the notion of "donor" to impute a set of possible values, termed "imprecise imputation".regressionImp
). In addition, simputation is a general package for imputation by any prediction method that can be combined with various regression methods, and works well with the tidyverse. WaverR imputes data using a weighted average of several regressions. iai tunes optimal imputation based on knn, tree or SVM.imputeMFA
.Some of the above mentioned packages can also handle multiple imputations.
In addition, mitools provides a generic approach to handle multiple imputation in combination with any imputation method.
impute_below
. TAR implements an estimation of the autoregressive threshold models with Gaussian noise and of positive-valued time series with a Bayesian approach in the presence of missing data. swgee implements a probability weighted generalized estimating equations method for longitudinal data with missing observations and measurement error in covariates based on SIMEX. icenReg performs imputation for censored responses for interval data. imputeTestbench proposes tools to benchmark missing data imputation in univariate time series. On a related topic, imputeFin handles imputation of missing values in financial time series using AR models or random walk.
3 months ago by Aurélie Siberchicot
Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences
3 years ago by Nathan Medina-Rodriguez
Allele Imputation and Haplotype Reconstruction from Pedigree Databases
6 years ago by Florian Meinfelder
Bayesian Bootstrap Predictive Mean Matching - Multiple and Single Imputation for Discrete Data
2 years ago by Chris Terry
Finds Missing Links and Metric Confidence Intervals in Ecological Bipartite Networks
8 years ago by Fernando Tusell
Analysis of categorical-variable datasets with missing values
2 years ago by Mario Santoro
Spatial and Spatio-Temporal Bayesian Model for Circular Data
20 days ago by Oliver Pfaffel
K-means clustering with build-in missing data imputation
4 years ago by Melanie Prague
Doubly Robust Inverse Probability Weighted Augmented GEE Estimator
a year ago by Hadrien Lorenzo
Data-Driven Sparse Partial Least Squares Robust to Missing Samples for Mono and Multi-Block Data Sets
4 years ago by Etienne A.D. Pienaar
Inference and Analysis for Diffusion Processes via Data Imputation and Method of Lines
7 years ago by Stephen R. Haptonstahl
Data-Informed Link Strength. Combine multiple-relationship networks into a single weighted network. Impute (fill-in) missing network links.
2 months ago by Choonghyun Ryu
Tools for Data Diagnosis, Exploration, Transformation
4 years ago by Il-Youp Kwak
Imputing Dropout Events in Single-Cell RNA-Sequencing Data
3 years ago by Emilie Poisson-Caillault
Imputation of Time Series Based on Dynamic Time Warping
3 years ago by POISSON-CAILLAULT Emilie
Imputation of Multivariate Time Series Based on Dynamic Time Warping
a year ago by Hang J. Kim
Simultaneous Edit-Imputation for Continuous Microdata
2 years ago by Peter Hoff
Semiparametric Factor and Regression Models for Symmetric Relational Data
a year ago by Kosuke Imai
R Package for Designing and Analyzing Randomized Experiments
2 years ago by Yun-Hee Choi
Family Age-at-Onset Data Simulation and Penetrance Estimation
6 years ago by Alessandro Barbiero
Imputation of Missing Values Through a Forward Imputation Algorithm
6 years ago by Alessandro Barbiero
The Forward Imputation: A Sequential Distance-Based Approach for Imputing Missing Data
2 years ago by Claudio Agostinelli
Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data
5 months ago by Hakon K. Gjessing
Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes
21 days ago by Jan Graffelman
Statistical Tests and Graphics for Hardy-Weinberg Equilibrium
5 years ago by Stephan Dlugosz
EM by the Method of Weights for Incomplete Categorical Data in Generlized Linear Models
4 months ago by Clifford Anderson-Bergman
Regression Models for Interval Censored Data
6 months ago by Daniel P. Palomar
Imputation of Financial Time Series with Missing Values and/or Outliers
5 years ago by Neeraj Bokde
Impute Missing Data in Time Series Data with PSF Based Method
2 years ago by Marcus W. Beck
Test Bench for the Comparison of Imputation Methods
a month ago by Genevieve Robin
Imputation of High-Dimensional Count Data using Side Information
2 months ago by Martin Elff
Management of Survey Data and Presentation of Analysis Results
10 days ago by Guido Schwarzer
Statistical Methods for Sensitivity Analysis in Meta-Analysis
4 days ago by Alexander Robitzsch
Some Additional Multiple Imputation Functions, Especially for 'mice'
2 years ago by Vincent Audigier
Multiple Imputation by Chained Equations with Multilevel Data
2 years ago by Jacques-Emmanuel Galimard
Missing not at Random Imputation Models for Multiple Imputation by Chained Equation
10 months ago by Sam Wilson
Multiple Imputation by Chained Equations with Random Forests
2 years ago by Genevieve Robin
Main Effects and Interactions in Mixed and Incomplete Data
7 years ago by Daniel J. Stekhoven
Nonparametric Missing Value Imputation using Random Forest
a month ago by Francois Husson
Handling Missing Values with Multivariate Data Analysis
4 days ago by Paul M. Hargarten
Multiple Imputation Using Weighted Quantile Sum Regression
4 years ago by Brian Ripley
Estimation/Multiple Imputation for Mixed Categorical and Continuous Data
a year ago by Michal Majka
High Performance Implementation of the Naive Bayes Algorithm
5 months ago by Nicholas Tierney
Data Structures, Summaries, and Visualisations for Missing Data
a year ago by Kevin Wright
Principal Components Analysis using NIPALS or Weighted EMPCA, with Gram-Schmidt Orthogonalization
11 days ago by Jingchen Hu
Non-Parametric Bayesian Multiple Imputation for Categorical Data
2 years ago by Frederic Bertrand
Partial Least Squares Regression for Beta Regression Models
2 years ago by Frederic Bertrand
Partial Least Squares Regression for Generalized Linear Models
2 years ago by Kristoffer Magnusson
Power Analysis for Longitudinal Multilevel Models
3 years ago by Marco Johannes Maier
Utilities to Fit Paired Comparison Models for Preferences
2 years ago by Michael C Sachs
Methods for Evaluating Principal Surrogates of Treatment Response
12 days ago by Martijn Heymans
Prediction Model Selection and Performance Evaluation in Multiple Imputed Datasets
4 years ago by Alexis Gabadinho
Probabilistic Suffix Trees and Variable Length Markov Chains
a year ago by Riyan Cheng
Tools for Mapping of Quantitative Traits of Genetically Related Individuals and Calculating Identity Coefficients from Pedigrees
3 years ago by Andy Liaw
Breiman and Cutler's Random Forests for Classification and Regression
a year ago by Serguei Rouzinov
Regression-Based Approach for Testing the Type of Missing Data
3 years ago by Roberto Serrano-Notivoli
Reconstruction of Daily Data - Precipitation
7 months ago by Quentin Grimonprez
Mixture Models with Heterogeneous and (Partially) Missing Data
8 months ago by Nathalie Vialaneix
Log-Linear Poisson Graphical Model with Hot-Deck Multiple Imputation
a year ago by Maria del Carmen Calatrava Moreno
An Extended Rao-Stirling Diversity Index to Handle Missing Data
a year ago by Valentin Todorov
Scalable Robust Estimators with High Breakdown Point for Incomplete Data
9 months ago by Zhiyong Zhang
Robust Structural Equation Modeling with Missing Data and Auxiliary Variables
3 years ago by Peter Hoff
Semiparametric Bayesian Gaussian Copula Estimation and Imputation
2 months ago by Jonathan Bartlett
Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification
3 months ago by Larissa A. Matos
Spatio-Temporal Estimation and Prediction for Censored/Missing Responses
2 years ago by Juan Xiong
Simulation Extrapolation Inverse Probability Weighted Generalized Estimating Equations
4 years ago by Hanwen Zhang
Bayesian Modeling of Autoregressive Threshold Time Series Models
2 months ago by Shenghai Dai
Missing Item Responses Imputation for Test and Assessment Data
a year ago by Minna Genbaeck
Uncertainty Intervals and Sensitivity Analysis for Missing Data
3 months ago by Mohammed Sedki
Variable Selection for Model-Based Clustering of Mixed-Type Data Set with Missing Values
5 years ago by Olivia Cheronet
Data Estimation using Weighted Averages of Multiple Regressions
3 years ago by Shahla Faisal
Weighted Nearest Neighbor Imputation of Missing Values using Selected Variables
a year ago by Nicholas L. Crookston
Nearest Neighbor Observation Imputation and Evaluation Tools
a year ago by Javier Palarea-Albaladejo
Treatment of Zeros, Left-Censored and Missing Values in Compositional Data Sets
9 months ago by Achim Zeileis
S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)