Provides a 'tm' Source to create corpora from articles exported from the 'Europresse' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages).
Version 1.4 - 2016-08-23 * Fix failures with a new variant of the Europresse HTML file format (reported by Tristan Guerra).
Version 1.3 - 2014-07-25 * Support recently-introduced Europresse HTML file format.
Version 1.2 - 2014-03-26 * Fix removal of search terms higlighted in red (reported by Patrick Lâm Lê).
Version 1.1 - 2014-05-25 * Adapt to tm 0.6. * Change all tags to lowercase (for consistency with tm). * Stop truncating document IDs.
Version 1.0.1 - 2014-02-11 * Fix small bug when parsing dates on Mac OS X.
Version 1.0 - 2014-02-10 * Initial release with support for HTML files.