Graphical Integrated Text Mining Solution

An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.


Version 0.7.10 - 2018-06-22 * Fix terms dictionary when importing it from CSV. * Fix number of documents limitation when showing CA on aggregate DTM. * Fix error when showing CA results multiple times. * Fix RJournal test with tm 0.7-2. * Use hypergeometric distribution rather than Chi2 distribution for term coocurrences.

Version 0.7.9 - 2017-07-07 * Fix various bugs when importing corpora with tm 0.7.

Version 0.7.8 - 2016-11-02 * Fix installing ROpenOffice by updating Omegahat URL.

Version 0.7.7 - 2016-07-08 * Improve speed when showing contents of large corpora. * Fix bug when restricting and then restoring corpus. * Support R versions before 3.3 again.

Version 0.7.6 - 2016-07-04 * Improve speed when showing CA results for large corpora. * Skip variables with more than 100 levels from CA to improve output. * Sort frequent terms by number of occurrences rather than by specificity. * Fix order of levels in legend for horizontal bar charts. * Make it easier to enter a custom number in fields. * Stop adding pseudo-shadows by repeating white text in CA plots. * Fix harmless error about removing 'lengths' object on import.

Version 0.7.5 - 2016-03-02 * Update Omegahat domain name (for ROpenOffice suggested package).

Version 0.7.4 - 2015-08-19 * Fix warning when loading package with Rcmdr 2.2.

Version 0.7.3 - 2015-01-23 * Fix computing correspondence analysis when corpus has no variables set. * Improve encoding detection by reading more characters from files.

Version 0.7.2 - 2014-09-06 * Fix importing corpus from spreasheet-like files.

Version 0.7.1 - 2014-09-06 * Minor release to support Rcmdr 2.1 and R2HTML 2.3. * Fix custom stemming when used from the Windows desktop shortcut. * Fix error message when sparsity field is empty.

Version 0.7 - 2014-06-01 * Port to tm 0.6. No changes should be visible to the user, except that saved workspaces and generated code will not work with this version. * Support LexisNexis, Europresse and Alceste importation. * Allow customizing the stemming dictionary before processing texts. * Automatically detect character encoding when importing corpora. * Replace sliders with spin boxes to allow more precise selection of parameters. * Add Select All and Copy menu and key bindings to text results window. * Allow manual selection of the column containing text when importing from spreadsheet-like files. * Improve performance when importing many Factiva, Europresse or LexisNexis files. * Allow selecting a language whose encoding is not supported by OS on Windows. * Allow choosing an occurrence threshold when subsetting corpus. * Print errors in dialogs rather than in the Messages pane. * Fix off-by-one number of occurrences in term frequencies. * Various fixes and improvements.

Version 0.6.2 - 2013-08-27 * Use system font for frame titles as Rcmdr 2.0-0. * Move several packages from Depends to Imports to avoid attaching too many packages (and use :: where necessary). * Fix new R CMD check WARNINGS.

Version 0.6.1 - 2013-06-05 * Use new SnowballC package instead of Snowball or Rstem for stemming, to fix issues with rJava and Rstem installation. * Adapt to Twitter API changes by supporting/requiring OAuth authentication (now needed to access tweets). * Allow extracting Company, Industry, InfoCode and InfoDesc meta-data from Factiva files (requires tm.plugin.factiva 1.3). * Plot dendrogram before creating clusters a second time. * Improved import dialogs usability (combo boxes, size). * Show more information about chosen clustering settings in the clustering window. * Fix time series analysis when custom time format is specified. * Fix levels order and hide NA value when plotting time series. * Fix error message when loading a package fails.

Version 0.6 - 2013-02-10 * Initial CRAN release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.7.10 by Milan Bouchet-Valat, 2 years ago

Report a bug at

Browse source code at

Authors: Milan Bouchet-Valat [aut, cre] , Gilles Bastin [aut]

Documentation:   PDF Manual  

Task views: Natural Language Processing

GPL (>= 2) license

Imports Rcmdr, tcltk, tcltk2, utils, ca, R2HTML, RColorBrewer, latticeExtra, stringi

Depends on methods, tm, NLP, slam, zoo, lattice

Suggests SnowballC, ROpenOffice, RODBC, tm.plugin.factiva, tm.plugin.lexisnexis, tm.plugin.europresse, tm.plugin.alceste, twitteR

See at CRAN