Functions and S3 classes for the following methods of encoding categorical features as numerics: aggregate, dummy, frequency, label, leave-one-out, mean, median, and one-hot.
cattonum (cat to num) provides different ways to encode categorical
features as numerics. It includes the following:
There are many existing packages with which to encode categorical features, including (among others):
The goal of
cattonum is to be a one-stop shop for all categorical
encoding needs. Nothing more, nothing less.
The development version of
cattonum can be installed from GitHub.
The latest official release of
cattonum can be installed from CRAN.
library(cattonum)data(iris)head(catto_loo(iris, response = Sepal.Length))#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species#> 1 5.1 3.5 1.4 0.2 5.004082#> 2 4.9 3.0 1.4 0.2 5.008163#> 3 4.7 3.2 1.3 0.2 5.012245#> 4 4.6 3.1 1.5 0.2 5.014286#> 5 5.0 3.6 1.4 0.2 5.006122#> 6 5.4 3.9 1.7 0.4 4.997959
catto_labelcan now encode different columns with different orderings and encode columns with user-specified orderings.
catto_medianhas been added, thanks to Mark Roepke in #10.
catto_onehotnow both return a
tibblewhen one is passed.
This is the first release of
cattonum. It currently includes the following encodings: