Data Transformation and Labelled Data Utility Functions

Collection of miscellaneous utility functions (especially intended for people coming from other statistical software packages like 'SPSS', and/ or who are new to R), supporting following common tasks when working with data : 1) Reading and writing data between R and other statistical software packages like 'SPSS', 'SAS' or 'Stata' and working with labelled data; this includes easy ways to get and set label attributes, to convert labelled vectors into factors (and vice versa), or to deal with multiple declared missing values etc. 2) Data transformation tasks like recoding, dichotomizing or grouping variables, setting and replacing missing values. The data transformation functions also support labelled data.

This package contains utility functions that are useful when carrying out data analysis, performing common recode and data transformation tasks or working with labelled data (especially intended for people coming from 'SPSS', 'SAS' or 'Stata' and/or who are new to R).

Basically, this package covers three domains of functionality:

  • reading and writing data between other statistical packages (like 'SPSS') and R, based on the haven and foreign packages
  • hence, this package also includes functions to make working with labelled data easier
  • frequently applied recoding and variable transformation tasks, also with support for labelled data

To install the latest development snapshot (see latest changes below), type following commands into the R console:



To install the latest stable release from CRAN, type following command into the R console:


In case you want / have to cite my package, please use citation('sjmisc') for citation information.


sjmisc 2.2.0

  • zap_inf() to "clean" vectors from NaN and infinite values.
  • descr() to provide basic descriptive statistics (similar to describe() in the psych-package), but including variable labels and usable in pipe-workflows. Also works with grouped data frames.
  • rec(), split_var() and dicho() get an argument suffix, to append a suffix to variable (column) names, if applied on a data frame.
  • Value labels in rec() can now directly be assigned inside the recodes-syntax (see 'Details' in ?rec).
  • find_var() gets a as.df-argument, to return a data frame with matching variables, instead of their column indices only.
  • find_var() gets a as.varlab-argument, to return a "summary" data frame with column number, variable name and variable label.
  • flat_table() now also accepts grouped data frames.
  • flat_table() gets a show.values-argument, to add values to associated labels in output.
  • frq() now also accepts grouped data frames.
  • frq() gets a to weight frequencies.
  • set_na() can now also find values by their value labels and replace them with NA.
  • set_na() now removes unused value labels from values that have been replaced with NA.
  • The as.tag-argument in set_na() now defaults to FALSE.
  • get_labels() now always returns labels in sorted order of the associated values.
  • get_labels() gets a drop.unused-argument, to automatically drop labels from values that don't occur in the vector.
  • For a named vector as labels-argument, set_labels() now always sorts labels in sorted order of the associated values.
  • is_empty() gets a first.only-argument, to evaluate either first or all elements of a character vector.
  • set_na() did not work on vectors of class Date when argument as.tag = TRUE.
  • flat_table() did not show values that had no value labels. Now all categories are shown in the frequency table.
  • rec() did not properly copy labels of tagged NA values when not all recoded values appeared in the vector.
  • frq() did not show correct values, when value labels of a vector were not sorted according their values.
  • set_labels() did not set labels properly for ordered factors.
  • remove_labels() returned NA-values for value labels (instead of no value labels) when the last value label of a vector was removed.

sjmisc 2.1.0

  • find_var() to find variables in data frames by name or label.
  • var_labels() as "tidyversed" alternative to set_label() to set variable labels.
  • var_rename() to rename variables.
  • Following functions now get an ellipses-argument ..., to apply function only to selected variables, but return the complete data frame (thus, overwriting existing variables in a data frame, if requested): to_factor(), to_value(), to_label(), to_character(), to_dummy(), zap_labels(), zap_unlabelled(), zap_na_tags().
  • Fixed bug with copying attributes from tibbles for merge_df().
  • Fixed wrong argument-description in docs of frq().

sjmisc 2.0.1

  • Removed package coin from Imports.
  • count_na() to print a frequency table of tagged NA values.
  • set_na() gets a drop.levels argument to keep or drop factor levels of values that have been replaced with NA.
  • set_na() gets a as.tag argument to set NA values as regular or tagged NA.

sjmisc 2.0.0

  • sjmisc now supports tagged NA values, a new structure for labelled missing values introduced by the haven-package. This means that functions or arguments that are no longer useful, have been removed while other functions dealing with NA values have been largely revised.
  • All statistical functions have been removed and are now in a separate package, sjstats.
  • Removed some S3-methods for labelled-class, as these are now provided by the haven-package.
  • Functions no longer check input for type matrix, to avoid conflicts with scaled vectors (that were recognized as matrix and hence treated as data frame).
  • table(*, exclude = NULL) was changed to table(*, useNA = "always"), because of planned changes in upcoming R version 3.4.
  • More functions (like trim() or frq()) now also have data frame- or list-methods.
  • zap_na_tags() to turn tagged NA values into regular NA values.
  • spread_coef() to spread coefficients of multiple fitted models in nested data frames into columns.
  • merge_imputations() to find the most likely imputed value for a missing value.
  • flat_table() to print flat (proportional) tables of labelled variables.
  • Added to_character() method.
  • big_mark() to format large numbers with big marks.
  • empty_cols() and empty_rows() to find variables or observations with exclusively NA values in a data frame.
  • remove_empty_cols() and remove_empty_rows() to remove variables or observations with exclusively NA values from a data frame.
  • str_contains() gets a switch argument to switch the role of x and pattern.
  • word_wrap() coerces vectors to character if necessary.
  • to_label() gets a var.label and drop.levels argument, and now preserves variable labels by default.
  • Argument def.value in get_label() now also applies to data frame arguments.
  • If factor levels are numeric and factor has value labels, these are used in to_value() by default.
  • to_factor() no longer generates NA or NaN-levels when converting input into factors.
  • rec() did not recode values, when these were the first element of a multi-line string of the recodes argument.
  • is_empty() returned NA instead of TRUE for empty character vectors.
  • Fixed bug with erroneous assignment of value labels to subset data when using copy_labels() (#20)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2.3.0 by Daniel Lüdecke, 18 days ago

Report a bug at

Browse source code at

Authors: Daniel Lüdecke <>

Documentation:   PDF Manual  

GPL-3 license

Imports broom, dplyr, haven, psych, purrr, stringdist, stringr, tibble, tidyr

Depends on stats, utils

Suggests Hmisc, mice, sjPlot, sjstats, knitr, rmarkdown

Imported by miceadds, sjPlot, sjstats, tadaatoolbox.

See at CRAN