High-Performance Stemmer, Tokenizer, and Spell Checker

Low level spell checker and morphological analyzer based on the famous 'hunspell' library < https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.


High-Performance Stemmer, Tokenizer, and Spell Checker for R

Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge CRAN RStudio mirror downloads Github Stars famous hunspell library https://hunspell.github.io. The package can analyze or check individual words as well as tokenize text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package with utilities to automate checking of files, documentation and vignettes in all common formats.

Installation

This package includes a bundled version of libhunspell and no longer depends on external system libraries:

install.packages("hunspell")

Documentation

About the R package:

Hello World

# Check individual words
words <- c("beer", "wiskey", "wine")
correct <- hunspell_check(words)
print(correct)
 
# Find suggestions for incorrect words
hunspell_suggest(words[!correct])
 
# Extract incorrect from a piece of text
bad <- hunspell("spell checkers are not neccessairy for langauge ninja's")
print(bad[[1]])
hunspell_suggest(bad[[1]])
 
# Stemming
words <- c("love", "loving", "lovingly", "loved", "lover", "lovely", "love")
hunspell_stem(words)
hunspell_analyze(words)

The spelling package uses this package to spell R package documentation:

# Spell check a package
library(spelling)
spell_check_package("~/mypackage")

News

2.9

  • BREAKING: Bundled en_US / en_GB dictionaries now support apostrophe
  • Parses now uses the dictionary WORDCHARS for tokenizing text
  • Removed numbers from en_US / en_GB WORDCHARS to prevent false positives
  • Fix regression inroduced in the caching fix in 2.8
  • Refactored internal dictionary caching
  • Add 'AUTHORS' file and update DESCRIPTION as requested by CRAN

2.8

  • Fix caching bug for 'ignore' argument in hunspell()
  • Rename class 'dictionary' to 'hunspell_dictionary' to avoid collisions
  • Remove setwd() from examples as requested by CRAN

2.7

  • Update bundled en_US / en_GB dictionaries from libreoffice extensions
  • Use Rcpp symbol registration / visibility
  • Properly pass down missing values
  • Added workaround for issue #29 (case sensitivity in custom wordlist)

2.6

  • Compile libhunspell with 'attribute((visibility("hidden")))' to solve a symbol conflict in rstudio (which also has libhunspell)

2.5

  • Add parameter 'add_words' to dictionary()

2.4

  • Update libhunspell to upstream v1.6.1
  • Update maintainer info
  • Add mandatory symbol registration

2.3

  • Update libhunspell to upstream v1.5.4
  • Change intro.rmd vignette to clean up downloaded files

2.2

  • Tweak code to make it build on old compilers (CentOS6 / gcc 4.4.7)

2.1

  • Update upstream to a6d32ee
  • Rebuild vignettes to fix CMD check timestamp warning

2.0

  • Added a beautiful intro vignette
  • Dictionaries are now their own class and get cached automatically via memoise
  • Make sure UTF-8 return values are marked properly. Fixes #16
  • Update libhunspell to upstream 4b43843

1.4.3

  • Fix UBSAN bug
  • Remove unused 'config.h' file (see upstream 2ccf840)

1.4.2

  • Switch to R's iconv wrapper which is more portable (thnx BDR)

1.4.1

  • Change license to cover libhunspell (per CRAN request).

1.4

  • Switch to bundled libhunspell because their API keeps breaking
  • Include libhunspell 1.5-pre (b13e62a)
  • Add parsers for HTML/XML formats

1.2

  • (Breaking) Rename 'hunspell_find' to 'hunspell'
  • Add support for other dictionaries
  • Use iconv() to convert encoding before checking
  • Use the 'en_stats' dict as default ignore list

1.1

  • Switch to hunspell parsers (replaced 'delim' with 'format' parameter)

1.0

  • Initial CRAN release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("hunspell")

2.9 by Jeroen Ooms, 5 months ago


https://github.com/ropensci/hunspell#readme (devel) https://hunspell.github.io (upstream)


Report a bug at https://github.com/ropensci/hunspell/issues


Browse source code at https://github.com/cran/hunspell


Authors: Jeroen Ooms [aut, cre], Authors of libhunspell [cph] (see AUTHORS file)


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL-2 | LGPL-2.1 | MPL-1.1 license


Imports Rcpp, digest

Suggests spelling, testthat, pdftools, janeaustenr, wordcloud2, knitr, rmarkdown

Linking to Rcpp


Imported by BrailleR, TeXCheckR, msgtools, ptstem, spelling, textstem, tidytext.

Suggested by SpaDES.core, SpaDES.tools, devtools, fakemake, fivethirtyeight, hrbrthemes, quickPlot, reproducible.


See at CRAN