Collection of miscellaneous utility functions (especially intended for people coming from other statistical software packages like 'SPSS', and/ or who are new to R), supporting following common tasks when working with data : 1) Reading and writing data between R and other statistical software packages like 'SPSS', 'SAS' or 'Stata' and working with labelled data; this includes easy ways to get and set label attributes, to convert labelled vectors into factors (and vice versa), or to deal with multiple declared missing values etc. 2) Data transformation tasks like recoding, dichotomizing or grouping variables, setting and replacing missing values. The data transformation functions also support labelled data, and all integrate seamlessly into a 'tidyverse'-workflow.
This package contains utility functions that are useful when carrying out data analysis, performing common recode and data transformation tasks or working with labelled data (especially intended for people coming from 'SPSS', 'SAS' or 'Stata' and/or who are new to R).
Basically, this package covers three domains of functionality:
To install the latest development snapshot (see latest changes below), type following commands into the R console:
To install the latest stable release from CRAN, type following command into the R console:
In case you want / have to cite my package, please use
citation('sjmisc') for citation information.
zap_inf()to "clean" vectors from
NaNand infinite values.
descr()to provide basic descriptive statistics (similar to
describe()in the psych-package), but including variable labels and usable in pipe-workflows. Also works with grouped data frames.
dicho()get an argument
suffix, to append a suffix to variable (column) names, if applied on a data frame.
rec()can now directly be assigned inside the
recodes-syntax (see 'Details' in
as.df-argument, to return a data frame with matching variables, instead of their column indices only.
as.varlab-argument, to return a "summary" data frame with column number, variable name and variable label.
flat_table()now also accepts grouped data frames.
show.values-argument, to add values to associated labels in output.
frq()now also accepts grouped data frames.
weight.by-argument to weight frequencies.
set_na()can now also find values by their value labels and replace them with NA.
set_na()now removes unused value labels from values that have been replaced with NA.
set_na()now defaults to
get_labels()now always returns labels in sorted order of the associated values.
drop.unused-argument, to automatically drop labels from values that don't occur in the vector.
set_labels()now always sorts labels in sorted order of the associated values.
first.only-argument, to evaluate either first or all elements of a character vector.
set_na()did not work on vectors of class
as.tag = TRUE.
flat_table()did not show values that had no value labels. Now all categories are shown in the frequency table.
rec()did not properly copy labels of tagged NA values when not all recoded values appeared in the vector.
frq()did not show correct values, when value labels of a vector were not sorted according their values.
set_labels()did not set labels properly for ordered factors.
remove_labels()returned NA-values for value labels (instead of no value labels) when the last value label of a vector was removed.
find_var()to find variables in data frames by name or label.
var_labels()as "tidyversed" alternative to
set_label()to set variable labels.
var_rename()to rename variables.
..., to apply function only to selected variables, but return the complete data frame (thus, overwriting existing variables in a data frame, if requested):
count_na()to print a frequency table of tagged NA values.
drop.levelsargument to keep or drop factor levels of values that have been replaced with NA.
as.tagargument to set NA values as regular or tagged NA.
NAvalues, a new structure for labelled missing values introduced by the haven-package. This means that functions or arguments that are no longer useful, have been removed while other functions dealing with NA values have been largely revised.
labelled-class, as these are now provided by the haven-package.
matrix, to avoid conflicts with scaled vectors (that were recognized as matrix and hence treated as data frame).
table(*, exclude = NULL)was changed to
table(*, useNA = "always"), because of planned changes in upcoming R version 3.4.
frq()) now also have data frame- or list-methods.
zap_na_tags()to turn tagged NA values into regular NA values.
spread_coef()to spread coefficients of multiple fitted models in nested data frames into columns.
merge_imputations()to find the most likely imputed value for a missing value.
flat_table()to print flat (proportional) tables of labelled variables.
big_mark()to format large numbers with big marks.
empty_rows()to find variables or observations with exclusively NA values in a data frame.
remove_empty_rows()to remove variables or observations with exclusively NA values from a data frame.
switchargument to switch the role of
word_wrap()coerces vectors to character if necessary.
drop.levelsargument, and now preserves variable labels by default.
get_label()now also applies to data frame arguments.
to_factor()no longer generates
NaN-levels when converting input into factors.
rec()did not recode values, when these were the first element of a multi-line string of the
TRUEfor empty character vectors.