Diagnostics to Assess the Effects of Text Preprocessing Decisions

Functions to assess the effects of different text preprocessing decisions on the inferences drawn from the resulting document-term matrices they generate.


An R package to assess the consequences of text preprocessing decisions.

[getting started with preText vignette].

The paper detailing the procedure can be found at the link below:

  • Matthew J. Denny, and Arthur Spirling (2017). "Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It". [ssrn.com/abstract=2849145]

Installation

The easiest way to do this is to install the package from CRAN via the standard install.packages command:

install.packages("preText")

If you want to get the latest version from GitHub, start by checking out the Requirements for using C++ code with R section in the following tutorial: Using C++ and R code Together with Rcpp. You will likely need to install either Xcode or Rtools depending on whether you are using a Mac or Windows machine before you can install the preText package via GitHub, since it makes use of C++ code.

install.packages("devtools")

Now we can install from Github using the following line:

devtools::install_github("matthewjdenny/preText")

Once the GERGM package is installed, you may access its functionality as you would any other package by calling:

library(preText)

If all went well, you should be able to replicate the steps in the vignette("getting_started").

Basic Usage

The basic functionality of this package is detailed in a vignette, which is [available here]. Beyond this basic functionality the package includes a number of additional utility and analysis functions for exploring and comparing multiple document--term matrices.

Bug Reporting

PLEASE REPORT ANY BUGS OR ERRORS TO [email protected].

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("preText")

0.6.2 by Matthew J. Denny, 5 months ago


Browse source code at https://github.com/cran/preText


Authors: Matthew J. Denny <[email protected]>, Arthur Spirling <[email protected]>,


Documentation:   PDF Manual  


GPL-3 license


Imports quanteda, ggplot2, vegan, grid, parallel, topicmodels, cowplot, ecodist, proxy, reshape2

Suggests testthat, knitr, rmarkdown


See at CRAN