High-Performance Stemmer, Tokenizer, and Spell Checker

Low level spell checker and morphological analyzer based on the famous 'hunspell' library < https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.


languages with rich morphology and complex word compounding or character encoding. The package can check and analyze individual words as well as search for incorrect words within a text, latex, html or xml document. Use the 'devtools' package to spell check R documentation with 'hunspell'.

This package includes a bundled version of libhunspell and no longer depends on external system libraries:

install.packages("hunspell")

About the R package:

# Check individual words
words <- c("beer", "wiskey", "wine")
correct <- hunspell_check(words)
print(correct)
 
# Find suggestions for incorrect words
hunspell_suggest(words[!correct])
 
# Extract incorrect from a piece of text
bad <- hunspell("spell checkers are not neccessairy for langauge ninja's")
print(bad[[1]])
hunspell_suggest(bad[[1]])
 
# Stemming
words <- c("love", "loving", "lovingly", "loved", "lover", "lovely", "love")
hunspell_stem(words)
hunspell_analyze(words)

The devtools package uses this package to spell R package documentation:

# Spell check a package
library(devtools)
spell_check("~/mypackage")

News

2.2

  • Tweak code to make it build on old compilers (CentOS6 / gcc 4.4.7)

2.1

  • Update upstream to a6d32ee
  • Rebuild vignettes to fix CMD check timestamp warning

2.0

  • Added a beautiful intro vignette
  • Dictionaries are now their own class and get cached automatically via memoise
  • Make sure UTF-8 return values are marked properly. Fixes #16
  • Update libhunspell to upstream 4b43843

1.4.3

  • Fix UBSAN bug
  • Remove unused 'config.h' file (see upstream 2ccf840)

1.4.2

  • Switch to R's iconv wrapper which is more portable (thnx BDR)

1.4.1

  • Change license to cover libhunspell (per CRAN request).

1.4

  • Switch to bundled libhunspell because their API keeps breaking
  • Include libhunspell 1.5-pre (b13e62a)
  • Add parsers for HTML/XML formats

1.2

  • (Breaking) Rename 'hunspell_find' to 'hunspell'
  • Add support for other dictionaries
  • Use iconv() to convert encoding before checking
  • Use the 'en_stats' dict as default ignore list

1.1

  • Switch to hunspell parsers (replaced 'delim' with 'format' parameter)

1.0

  • Initial CRAN release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("hunspell")

2.9 by Jeroen Ooms, a month ago


https://github.com/ropensci/hunspell#readme (devel) https://hunspell.github.io (upstream)


Report a bug at https://github.com/ropensci/hunspell/issues


Browse source code at https://github.com/cran/hunspell


Authors: Jeroen Ooms [aut, cre], Authors of libhunspell [cph] (see AUTHORS file)


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL-2 | LGPL-2.1 | MPL-1.1 license


Imports Rcpp, digest

Suggests spelling, testthat, pdftools, janeaustenr, wordcloud2, knitr, rmarkdown

Linking to Rcpp


Imported by BrailleR, TeXCheckR, hrbrthemes, msgtools, ptstem, spelling, textstem, tidytext.

Suggested by SpaDES.core, SpaDES.tools, devtools, fakemake, fivethirtyeight, quickPlot, reproducible.


See at CRAN