Time Series Missing Value Imputation

Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: 'Mean', 'LOCF', 'Interpolation', 'Moving Average', 'Seasonal Decomposition', 'Kalman Smoothing on Structural Time Series models', 'Kalman Smoothing on ARIMA models'.


Project Status: Active The project has reached a stable, usable state and is being actively developed. Build Status AppVeyor Build Status codecov CRAN Version CRAN Release CRAN Downloads

The imputeTS package specializes on (univariate) time series imputation. It offers several different imputation algorithm implementations. Beyond the imputation algorithms the package also provides plotting and printing functions of time series missing data statistics. Additionally three time series datasets for imputation experiments are included.

Installation

The imputeTS package can be found on CRAN. For installation execute in R:

 install.packages("imputeTS")

If you want to install the latest version from GitHub (can be unstable) run:

library(devtools)
install_github("SteffenMoritz/imputeTS")

Usage

  • ###### Imputation

    To impute (fill all missing values) in a time series x, run the following command:

     na.interpolation(x)
    

    Output is the time series x with all NA's replaced by reasonable values.

    In this case interpolation was the algorithm of choice for calculating the NA replacements. There are several other algorithms (see also under caption "Imputation Algorithms"). All imputation functions are named alike starting with na. followed by a algorithm label e.g. na.mean, na.kalman, ...

  • ###### Plotting

    To plot missing data statistics for a time series x, run the following command:

     plotNA.distribution(x)
    

    This is also just one example for a plot. Overall there are four different types of missing data plots. (see also under caption "Missing Data Plots").

  • ###### Printing

    To print statistics about the missing data in a time series x, run the following command:

     statsNA(x)
    
  • ###### Datasets

    To load the 'heating' time series (with missing values) into a variable y and the 'heating' time series (without missing values) into a variable z, run:

     y <- tsHeating
     z <- tsHeatingComplete
    

    There are three datasets provided with the package, the 'tsHeating', the 'tsAirgap' and the 'tsNH4' time series. (see also under caption "Datasets").

Imputation Algorithms

Here is a table with available algorithms to choose from:

Function Description
na.interpolation Missing Value Imputation by Interpolation
na.kalman Missing Value Imputation by Kalman Smoothing
na.locf Missing Value Imputation by Last Observation Carried Forward
na.ma Missing Value Imputation by Weighted Moving Average
na.mean Missing Value Imputation by Mean Value
na.random Missing Value Imputation by Random Sample
na.remove Remove Missing Values
na.replace Replace Missing Values by a Defined Value
na.seadec Seasonally Decomposed Missing Value Imputation
na.seasplit Seasonally Splitted Missing Value Imputation

This is a rather broad overview. The functions itself mostly offer more than just one algorithm. For example na.interpolation can be set to linear or spline interpolation.

More detailed information about the algorithms and their options can be found in the imputeTS reference manual.

Missing Data Plots

Here is a table with available plots to choose from:

Function Description
plotNA.distribution Visualize Distribution of Missing Values
plotNA.distributionBar Visualize Distribution of Missing Values (Barplot)
plotNA.gapsize Visualize Distribution of NA gapsizes
plotNA.imputations Visualize Imputed Values

More detailed information about the plots can be found in the imputeTS reference manual.

Datasets

There are two datasets (each in two versions) available:

Dataset Description
tsAirgap Time series of monthly airline passengers (with NAs)
tsAirgapComplete Time series of monthly airline passengers (complete)
tsHeating Time series of a heating systems supply temperature (with NAs)
tsHeatingComplete Time series of a heating systems supply temperature (complete)
tsNH4 Time series of NH4 concentration in a wastewater system (with NAs)
tsNH4Complete Time series of NH4 concentration in a wastewater system (complete)

The tsAirgap, tsHeating and tsNH4 time series are with NAs. Their complete versions are without NAs. Except the missing values their versions are identical. The NAs for the time series were artifically inserted by simulating the missing data pattern observed in similar non-complete time series from the same domain. Having a complete and incomplete version of the same dataset is useful for conducting experiments of imputation functions.

More detailed information about the datasets can be found in the imputeTS reference manual.

Reference

You can cite imputeTS the following:

Moritz, Steffen, and Thomas Bartz-Beielstein. "imputeTS: Time Series Missing Value Imputation in R." R Journal 9.1 (2017).

Support

If you found a bug or have suggestions, feel free to get in contact via steffen.moritz10 at gmail.com

All feedback is welcome

Version

2.7

License

GPL-3

News

Changes in Version 2.7

  • Updated Description: Orcid Id added, packages required for unit test add as "Suggested"

  • Small correction in README.md, small update to citation file

  • Replaced NEWS with NEWS.md for better formatting

Changes in Version 2.6

  • Updated citation file

  • Minor changes to vignette

Changes in Version 2.5

  • Adjusted unit test to a update of forecast package

Changes in Version 2.4

  • Small speed improvments for na.kalman

  • Improved input check for all functions

  • Bugfix for unit tests

  • Changes to unit test (because of zoo update)

Changes in Version 2.3

  • Bugfix for na.kalman with integer input

  • Readme Update

  • Improved error messages for na.seasplit and na.seadec

  • Minor vignette changes

Changes in Version 2.2

  • Bugfix for na.locf (also concerned na.kalman)

Changes in Version 2.1

  • Fixed for problems with Solaris/Sparc

  • Fixes for problems with vignette on osx

Changes in Version 2.0

  • Bugfix for plots without missing data

  • Increased performance for na.locf

  • Minor bugfixes for specific data.frame inputs

  • Minor bugfixes for specific xts object inputs

  • Improved Code Documentation

  • Added new software tests

Changes in Version 1.9

  • Added Vignette

Changes in Version 1.8

  • Computation time improvments for na.locf (up to 10000 times faster)

  • Computation time improvments for na.interpolation (up to 10000 times faster)

  • Computation time improvments for na.kalman (only slightly faster, under 10%)

  • Fixed unnecessary warning message with some na.kalman options

  • Adjusted default parameters for plotNA.distributionBar (using nclass.Sturges for breaks parameter)

  • Fixed issue with too sensitive input checking

Changes in Version 1.7

  • Enabled usage of multivariate input (data.frame, mts, matrix,...) for all imputation functions except na.remove. This means users do not have to loop through all columns by themselfes anymore if they want to use the package with multivariate data. The imputation itself is still performend in univariate manner (column after column).

  • Improved compatibility with different advanced time series objects like zoo and xts. Using the imputation functions with these time series objects should be possible now. These series will not be explicitly named as possible input in the user documentation. Absence of errors can not be guaranteed. However, there are no known issues yet.

  • Added several things for unit tests with pkg 'testthat'

  • Added unit tests for every function

  • Adjusted error messages

  • Internal Coding style improvement: replaced all T with TRUE and all F with FALSE

  • Adjustment tsHeating / tsHeatingComplete datasets (set 1440 as frequency parameter)

  • Adjustment tsNH4 / tsNH4Complete datasets (set 144 as frequency parameter)

  • Fixes for grammar, spelling and citations in the whole documentation

  • Revised examples in the documentation for all functions

  • Restricted output of na.remove to vector only (issue with incorrect time information otherwise)

  • Added better x-axes labels for plotNA.distribution

Changes in Version 1.6

  • Added github links to description file

  • Added citation file

  • Updated Readme (badges for travis ci and cran status)

  • Fix in documentation for na.interpolation (due to outdated descriptions)

  • Fix in documentation plotNA.distribution / plotNA.distributionBar (due to interchanged descriptions)

  • Added references to used packages in na.kalman and na.interpolation documentation

Changes in Version 1.5

  • Allows now also numeric vectors as input

  • Removed na.identifier parameter for all functions (too error prone, better handled individually by the user)

  • Minor changes in na.interpolation with option = "stine"

  • Added na.ma imputation function

  • Replaced "data" in all function parameters with the more common "x"

  • Improvement of all code examples

  • Renamed heating/heatingComplete dataset to tsHeating/tsHeatingComplete

  • Renamed nh4/nh4Complete dataset to tsNH4/tsNH4Complete

  • Added tsAirgap / tsAirgapComplete datasets

  • Improved imputeTS-package documentation

  • Added na.kalman imputation function

  • Added README.md function

  • Added statsNA function

  • Added plotNA.gapsize function

  • Renamed vis.imputations to plotNA.imputations

  • Renamed vis.barMissing to plotNA.distributionBar

  • Renamed vis.missing to plotNA.distribution

  • Fixed issues with parameter pass through and legend for all plotting functions

  • Improved dataset documentation

Changes in Version 0.4

  • Update of vis.differences (better looking plot now)

  • Added vis.missing to visualize the distribution of missing data in a time series

  • Added vis.barMissing, which is especially suited to visualize missing data in very huge time series

  • Update na.interpolate (added Stineman interpolation and enabled ... parameter for all interpolation algorithms to pass through parameters to the underlying functions)

Changes in Version 0.3

  • Added two datasets of sensor data

  • vis.differences for plotting differences between real and imputed values

Changes in Version 0.2

  • Removed internal functions from visible package documentation

  • Added additional algorithms: na.seasplit and na.seadec

  • internal function for algorithm selection

Changes in Version 0.1

  • Created initial version of imputeTS package for univariate time series imputation

  • added the simple imputation functions: na.locf, na.mean, na.random, na.interpolation, na.replace

  • added na.remove function for removing all NAs from a time series

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("imputeTS")

2.7 by Steffen Moritz, 9 months ago


https://github.com/SteffenMoritz/imputeTS


Report a bug at https://github.com/SteffenMoritz/imputeTS/issues


Browse source code at https://github.com/cran/imputeTS


Authors: Steffen Moritz [aut, cre, cph]


Documentation:   PDF Manual  


Task views: Time Series Analysis, Missing Data


GPL-3 license


Imports stats, stinepack, graphics, grDevices, forecast, magrittr, Rcpp

Suggests testthat, utils, zoo, timeSeries, tis, xts

Linking to Rcpp


Imported by EventDetectR, gimme, hpiR, imputeTestbench.

Suggested by baytrends, epimdr, naniar.


See at CRAN