Text Processing for Small or Big Data Files

Processes big text data files in batches efficiently. For this purpose, it offers functions for splitting, parsing, tokenizing and creating a vocabulary. Moreover, it includes functions for building either a document-term matrix or a term-document matrix and extracting information from those (term-associations, most frequent terms). Lastly, it embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.


News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("textTinyR")

1.0.3 by Lampros Mouselimis, 2 months ago


https://github.com/mlampros/textTinyR


Report a bug at https://github.com/mlampros/textTinyR/issues


Browse source code at https://github.com/cran/textTinyR


Authors: Lampros Mouselimis <mouselimislampros@gmail.com>


Documentation:   PDF Manual  


GPL-3 license


Imports Rcpp, R6, data.table, utils

Depends on Matrix

Suggests testthat, covr, knitr, rmarkdown

Linking to Rcpp, RcppArmadillo, BH

System requirements: The package requires the following two components : A C++11 compiler and on a unix OS the boost-locale headers and libraries ( boost >= 1.55.0 , www.boost.org ). Debian/Ubuntu: libboost-locale-dev, Fedora : yum install boost-devel, OSX/brew : detailed installation instructions can be found in the README file


See at CRAN