Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library < https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) < https://www.aclweb.org/anthology/P16-1162>.


News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tokenizers.bpe")

0.1.0 by Jan Wijffels, 23 days ago


https://github.com/bnosac/tokenizers.bpe


Browse source code at https://github.com/cran/tokenizers.bpe


Authors: Jan Wijffels [aut, cre, cph] (R wrapper) , BNOSAC [cph] (R wrapper) , VK.com [cph] , Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License , Version 2.0) , The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License , Version 2.0) , Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))


Documentation:   PDF Manual  


MPL-2.0 license


Imports Rcpp

Linking to Rcpp


See at CRAN