Import Articles from 'Europresse' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).


News

Version 1.4 - 2016-08-23 * Fix failures with a new variant of the Europresse HTML file format (reported by Tristan Guerra).

Version 1.3 - 2014-07-25 * Support recently-introduced Europresse HTML file format.

Version 1.2 - 2014-03-26 * Fix removal of search terms higlighted in red (reported by Patrick Lâm Lê).

Version 1.1 - 2014-05-25 * Adapt to tm 0.6. * Change all tags to lowercase (for consistency with tm). * Stop truncating document IDs.

Version 1.0.1 - 2014-02-11 * Fix small bug when parsing dates on Mac OS X.

Version 1.0 - 2014-02-10 * Initial release with support for HTML files.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tm.plugin.europresse")

1.4 by Milan Bouchet-Valat, 4 years ago


https://r-forge.r-project.org/projects/r-temis/


Report a bug at https://r-forge.r-project.org/tracker/?group_id=1437


Browse source code at https://github.com/cran/tm.plugin.europresse


Authors: Milan Bouchet-Valat [aut, cre]


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL (>= 2) license


Imports utils, NLP, tm, XML


Imported by R.temis.

Suggested by RcmdrPlugin.temis.


See at CRAN