Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <> and <> and parsed into an R-friendly format.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.0.2 by Jon Harmon, 2 months ago

Report a bug at

Browse source code at

Authors: Jonathan Bratt [aut] , Jon Harmon [aut, cre] , Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph] , Google , Inc [cph] (original BERT vocabularies)

Documentation:   PDF Manual  

Apache License (>= 2) license

Suggests testthat

Imported by wordpiece.

See at CRAN