Korean NLP Package

POS Tagger and Morphological Analyzer for Korean text based research. It provides tools for corpus linguistics research such as Keystroke converter, Hangul automata, Concordance, and Mutual Information. It also provides a convenient interface for users to apply, edit and add morphological dictionary selectively.


This package lets you do text mining with Korean morphological analyzer on R.

  • Interfacing with opensource Hannanum analyzer.
  • Some twiks are applied on Hannanum analyzer for bigger or flexible user dictionary for Sejong project.
  • Many other functions for Korean text analysis like keystroke conversion, is.jamo, is.hangul, Hangul antomata...

Some of Korean tutorials are on my blog, English pages are mainly on wiki.

To install from CRAN, use

install.packages('KoNLP')

To install from GitHub, use

install.packages('devtools')
library(devtools)
install_github('KoNLP', 'haven-jeon')

News

  • fix duplicated Morphological,POS results cause by duplicated word on dictionaries.
  • autoSpacing for input sentence(using unigram HMM).
  • show information when unable to process sentences on extractNoun, SimplePos22 ,SimplePos09, MorphAnalyzer and won't stop by error.
  • can apply multiple sentences with cahracter vector on extractNoun, SimplePos22 ,SimplePos09, MorphAnalyzer.
  • make more effective memory management for big dictionaries.
  • up to one million words can be used for text analysis using new NIADic package.
  • added 'buildDictionary()', 'useNIADic()' for additional dictionaries.
  • SQLite for dictionary management.
  • apply new sentence segmentation plugins.
  • support Scala for plugin development for HanNanum.
  • disable warning for long Eojeol sentence inputs.
  • fix infinite wait when input abnormal sentences.
  • fix 'concordance_str()', thanks Taekyung Kim.
  • deprecated 'mergeUserDic()'
  • new functions added(concordance_*, mutualinformation).
  • fix OutOfMemoryError on mac osx with R 3.0.x.
  • added references on manual.
  • suppress one NOTE.
  • remove deprecated functions.
  • decreasing package size dramatically.
  • set Null on -Xmx to use JVM's default optimal parameters.
  • function to view summary of dictionary using statDic().
  • fix bugs on SimplePosXX().
  • check more cases on raw input sentences.
  • secondary JVM option to default.
  • fix path with platform independent way.
  • make more easy on mergeUserDic().
  • now support vector on is.*()s.
  • make interface with HanNanum Analyzer for direct access to zipped dictionaries.
  • added useSejongDic(), useSystemDic() functions.
  • no need to use explicitly UTF-8 string on 'is.~' functions.
  • fix Windows path problem.
  • added is.jaeum, is.moeum, is.ascii functions.
  • extract out all dictionary to Sejong package.
  • make some error to warning for continuous processing.
  • add messages on is.hangul and is.jamo if input is not UTF-8.
  • set -Xmx512m when the time user system is poor on memory.
  • fix issue "Continuous "[:space:]" in sentence can make infinite wait."
  • new dictionary added which from Sejong projects
  • added example
  • Set "dontrun" on example code because Windows encoding problems.
  • Warning message may appear if system doesn't have Hangul encoding ability(no warning on UTF-8).
  • added user dictionary manage functions.
  • to -Xmx1024m for big dictionary size
  • supports JRE 1.5
  • add is.jamo() and fix is.hangul()
  • add tag name converter for future use
  • add edit distance cost table for Hangul for future use
  • fix documentation
  • fixing bug on HangulAutomata function
  • added more test case
  • added input string encoding detection function(can only detect UTF-8,16,32)
  • added Hangul automata logic, now can convert jamo sequences to Hangul syllable.
  • added user dic reloading function(for future use)
  • full support Jamo and keystrokes converting
  • add link to github wiki for examples using Hangul
  • improved performance more than 100 times on functions which related with Hannanum Analyzer.
  • set -Xmx512m for Java VM.
  • Java sources are added.
  • fix encoding problems when in Windows.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("KoNLP")

0.80.1 by Heewon Jeon, 10 months ago


https://github.com/haven-jeon/KoNLP


Report a bug at https://github.com/haven-jeon/KoNLP/issues


Browse source code at https://github.com/cran/KoNLP


Authors: Heewon Jeon [aut, cre], Taekyung Kim [ctb]


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL-3 license


Imports rJava, utils, stringr, hash, tau, Sejong, RSQLite, devtools

Suggests knitr, rmarkdown

System requirements: Java (>= 1.6)


Depended on by Rtextrankr.


See at CRAN