Customizable Lists of Stopwords in 53 Languages

Functions to generate stopword lists in 53 languages, in a way consistent across all the languages supported. The generated lists are based on the morphological tagset from the Universal Dependencies.


Authors: Silvie Cinková*, Maciej Eder
License: GPL-3

An R package containing customizable lists of stopwords in multiple languages; it attempts to follow tidy data principles.

The idea behind this package is to give the user control over the stopword selection. The core generate_stoplist() function relies on multilingual_stopwords(), a large data frame derived from the current release of the Universal Dependencies Treebanks. We have included all languages whose corpora totalled above 10,000 tokens – large enough to cover all common closed-class words, such as prepositions, conjunctions, and auxiliary verbs. The data comes encoded in UTF-8.

Installation

Install the package directly from the GitHub repository:

library(devtools)
install_github("computationalstylistics/stopwoRds", build_vignettes = TRUE)

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tidystopwords")

0.9.0 by Maciej Eder, 5 months ago


Browse source code at https://github.com/cran/tidystopwords


Authors: Silvie Cinkova , Maciej Eder


Documentation:   PDF Manual  


GPL (>= 3) license


Imports dplyr, stringr

Suggests knitr, rmarkdown


See at CRAN