Diagnostics to Assess the Effects of Text Preprocessing Decisions

Functions to assess the effects of different text preprocessing decisions on the inferences drawn from the resulting document-term matrices they generate.

An R package to assess the consequences of text preprocessing decisions.

[getting started with preText vignette].

The paper detailing the procedure can be found at the link below:

  • Matthew J. Denny, and Arthur Spirling (2016). "Assessing the Consequences of Text Preprocessing Decisions". [ssrn.com/abstract=2849145]

We are currently working on getting a version of the package up on CRAN. As soon as it is available, we will update the installation instructions.

If you want to get the latest version from GitHub, start by checking out the Requirements for using C++ code with R section in the following tutorial: Using C++ and R code Together with Rcpp. You will likely need to install either Xcode or Rtools depending on whether you are using a Mac or Windows machine before you can install the preText package via GitHub, since it makes use of C++ code.


Now we can install from Github using the following line:


Once the preText package is installed, you may access its functionality as you would any other package by calling:


If all went well, you should be able to replicate the steps in the vignette("getting_started").

The basic functionality of this package is detailed in a vignette, which is [available here]. Beyond this basic functionality the package includes a number of additional utility and analysis functions for exploring and comparing multiple document--term matrices.



0.4.4 by Matthew J. Denny, 5 months ago

Browse source code at https://github.com/cran/preText

Authors: Matthew J. Denny <mdenny@psu.edu>, Arthur Spirling <as9934@nyu.edu>,

Documentation:   PDF Manual  

GPL-3 license

Imports quanteda, gridExtra, ggplot2, vegan, grid, parallel, topicmodels, cowplot, ecodist, proxy, reshape2

Suggests testthat, knitr, rmarkdown

