An 'Rcpp' Interface for Eunjeon Project

An 'Rcpp' interface for Eunjeon project < http://eunjeon.blogspot.com/>. The 'mecab-ko' and 'mecab-ko-dic' is based on a C++ library, and part-of-speech tagging with them is useful when the spacing of source Korean text is not correct. This package provides part-of-speech tagging and tokenization function for Korean text.


The goal of RmecabKo is to parse Korean phrases with mecab-ko (Eunjeon project, and to provide helper functions to analyze Korean documents. RmecabKo provides R wrapper function of mecab-ko with Rcpp (in Mac OSX and Linux) or wrapper function of binary build of mecab-ko-msvc by system commands and file I/O (in Windows).

Installation

Mac OSX, Linux

First, install mecab-ko from the Bitbucket repository.

You can download a source of mecab-ko from Download page.

In Mac OSX terminal:

$ tar zxfv mecab-ko-XX.tar.gz
$ cd mecab-ko-XX
$ ./configure 
$ make
$ make check
$ sudo make install

In Linux:

$ tar zxfv mecab-ko-XX.tar.gz
$ cd mecab-ko-XX
$ ./configure 
$ make
$ make check
$ su
# make install

After the installation of mecab-ko, You can install RmecabKo from github with:

# install.packages("devtools")
devtools::install_github("junhewk/RmecabKo")

You need to install mecab-ko-dic, refer to Bitbucket page. The installation procedure is same as mecab-ko.

Windows

In Windows, install_mecab function is provided.

# install.packages("devtools")
devtools::install_github("junhewk/RmecabKo")
install_mecab()

Example

Basic usage of the provided functions is to put character vector in phrase parameter of pos(phrase) and nouns(phrase). Loop between phrases are operated in the C++ binary, thus you can analyze many phrases quickly.

pos("Hello. This is R wrapper of Korean morpheme analyzer mecab-ko.")

Output of the pos is list. Each element of the list contains classified morpheme and inferred part-of-speech (POS), separated by "/". The name of the element is the original phrase.

Output of the nouns is also list. Each element of the list contains extracted nouns. The name of the element is the original phrase.

More examples will be provided on Github wiki.

Author

Junhewk Kim ([email protected])

Thanks to and Contributor

  • Eunjeon project: Fork Japanese morpheme analyzer mecab to Korean version
  • Wonsup Yoon: VC++ binary build of mecab-ko-msvc, mecab-ko-dic-msvc.

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("RmecabKo")

0.1.6.2 by Junhewk Kim, 8 months ago


Browse source code at https://github.com/cran/RmecabKo


Authors: Junhewk Kim


Documentation:   PDF Manual  


GPL (>= 2) license


Imports Rcpp, stringr

Linking to Rcpp


See at CRAN