R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' () tokenization to input text, given an appropriate vocabulary. The 'BERT' () tokenization conventions are used by default.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2.0.1 by Jonathan Bratt, 3 months ago


Report a bug at https://github.com/macmillancontentscience/wordpiece/issues

Browse source code at https://github.com/cran/wordpiece

Authors: Jonathan Bratt [aut, cre] , Jon Harmon [aut] , Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]

Documentation:   PDF Manual  

Apache License (>= 2) license

Imports dlr, piecemaker, purrr, rlang, stringi, wordpiece.data

Suggests covr, knitr, rmarkdown, testthat

See at CRAN