Encode Categorical Features

Functions for dummy encoding, frequency encoding, label encoding, leave-one-out encoding, mean encoding, median encoding, and one-hot encoding.


CRAN_Status_Badge BuildStatus AppVeyor BuildStatus Coveragestatus TotalDownloads lifecycle

Summary

cattonum (cat to num) provides different ways to encode categorical features as numerics. It includes the following:

  • dummy encoding: catto_dummy
  • feature hashing (future)
  • frequency encoding: catto_freq
  • label encoding: catto_label
  • leave-one-out encoding: catto_loo
  • mean encoding: catto_mean
  • median encoding: catto_median
  • one-hot encoding: catto_onehot

There are many existing packages with which to encode categorical features, including (among others):

The goal of cattonum is to be a one-stop shop for all categorical encoding needs. Nothing more, nothing less.

Installation

The development version of cattonum can be installed from GitHub.

remotes::install_github("bfgray3/cattonum")

The latest official release of cattonum can be installed from CRAN.

install.packages("cattonum")

Usage

library(cattonum)
data(iris)
head(catto_loo(iris, response = Sepal.Length))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width  Species
#> 1          5.1         3.5          1.4         0.2 5.004082
#> 2          4.9         3.0          1.4         0.2 5.008163
#> 3          4.7         3.2          1.3         0.2 5.012245
#> 4          4.6         3.1          1.5         0.2 5.014286
#> 5          5.0         3.6          1.4         0.2 5.006122
#> 6          5.4         3.9          1.7         0.4 4.997959

News

cattonum 0.0.2

  • catto_label can now encode different columns with different orderings and encode columns with user-specified orderings.
  • catto_median has been added, thanks to Mark Roepke in #10.
  • catto_dummy and catto_onehot now both return a tibble when one is passed.
  • The following people have contributed to this release:

cattonum 0.0.1

  • This is the first release of cattonum. It currently includes the following encodings:

    • dummy encoding: catto_dummy
    • frequency encoding: catto_freq
    • label encoding: catto_label
    • leave-one-out encoding: catto_loo
    • mean encoding: catto_mean
    • one-hot encoding: catto_onehot

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("cattonum")

0.0.2 by Bernie Gray, 10 months ago


https://github.com/bfgray3/cattonum


Report a bug at https://github.com/bfgray3/cattonum/issues


Browse source code at https://github.com/cran/cattonum


Authors: Bernie Gray [aut, cre] , Mark Roepke [ctb]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports dplyr, stats, tidyselect

Suggests covr, knitr, nycflights13, ranger, rmarkdown, testthat, tibble


See at CRAN