An 'R Commander' plug-in providing an integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, vocabulary tables, terms co-occurrences and documents similarity measures, time series analysis, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, 'Twitter' queries, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.


Version 0.7.10 - 2018-06-22 * Fix terms dictionary when importing it from CSV. * Fix number of documents limitation when showing CA on aggregate DTM. * Fix error when showing CA results multiple times. * Fix RJournal test with tm 0.7-2. * Use hypergeometric distribution rather than Chi2 distribution for term coocurrences.

Version 0.7.9 - 2017-07-07 * Fix various bugs when importing corpora with tm 0.7.

Version 0.7.8 - 2016-11-02 * Fix installing ROpenOffice by updating Omegahat URL.

Version 0.7.7 - 2016-07-08 * Improve speed when showing contents of large corpora. * Fix bug when restricting and then restoring corpus. * Support R versions before 3.3 again.

Version 0.7.6 - 2016-07-04 * Improve speed when showing CA results for large corpora. * Skip variables with more than 100 levels from CA to improve output. * Sort frequent terms by number of occurrences rather than by specificity. * Fix order of levels in legend for horizontal bar charts. * Make it easier to enter a custom number in fields. * Stop adding pseudo-shadows by repeating white text in CA plots. * Fix harmless error about removing 'lengths' object on import.

Version 0.7.5 - 2016-03-02 * Update Omegahat domain name (for ROpenOffice suggested package).

Version 0.7.4 - 2015-08-19 * Fix warning when loading package with Rcmdr 2.2.

Version 0.7.3 - 2015-01-23 * Fix computing correspondence analysis when corpus has no variables set. * Improve encoding detection by reading more characters from files.

Version 0.7.2 - 2014-09-06 * Fix importing corpus from spreasheet-like files.

Version 0.7.1 - 2014-09-06 * Minor release to support Rcmdr 2.1 and R2HTML 2.3. * Fix custom stemming when used from the Windows desktop shortcut. * Fix error message when sparsity field is empty.

Version 0.7 - 2014-06-01 * Port to tm 0.6. No changes should be visible to the user, except that saved workspaces and generated code will not work with this version. * Support LexisNexis, Europresse and Alceste importation. * Allow customizing the stemming dictionary before processing texts. * Automatically detect character encoding when importing corpora. * Replace sliders with spin boxes to allow more precise selection of parameters. * Add Select All and Copy menu and key bindings to text results window. * Allow manual selection of the column containing text when importing from spreadsheet-like files. * Improve performance when importing many Factiva, Europresse or LexisNexis files. * Allow selecting a language whose encoding is not supported by OS on Windows. * Allow choosing an occurrence threshold when subsetting corpus. * Print errors in dialogs rather than in the Messages pane. * Fix off-by-one number of occurrences in term frequencies. * Various fixes and improvements.

Version 0.6.2 - 2013-08-27 * Use system font for frame titles as Rcmdr 2.0-0. * Move several packages from Depends to Imports to avoid attaching too many packages (and use :: where necessary). * Fix new R CMD check WARNINGS.

Version 0.6.1 - 2013-06-05 * Use new SnowballC package instead of Snowball or Rstem for stemming, to fix issues with rJava and Rstem installation. * Adapt to Twitter API changes by supporting/requiring OAuth authentication (now needed to access tweets). * Allow extracting Company, Industry, InfoCode and InfoDesc meta-data from Factiva files (requires tm.plugin.factiva 1.3). * Plot dendrogram before creating clusters a second time. * Improved import dialogs usability (combo boxes, size). * Show more information about chosen clustering settings in the clustering window. * Fix time series analysis when custom time format is specified. * Fix levels order and hide NA value when plotting time series. * Fix error message when loading a package fails.

Version 0.6 - 2013-02-10 * Initial CRAN release.

0.7.10 by Milan Bouchet-Valat, 2 years ago

Authors: Milan Bouchet-Valat [aut, cre] , Gilles Bastin [aut]

Task views: Natural Language Processing

GPL (>= 2) license

Imports Rcmdr, tcltk, tcltk2, utils, ca, R2HTML, RColorBrewer, latticeExtra, stringi

Depends on methods, tm, NLP, slam, zoo, lattice

Suggests SnowballC, ROpenOffice, RODBC, tm.plugin.factiva, tm.plugin.lexisnexis, tm.plugin.europresse, tm.plugin.alceste, twitteR

