Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.


Version 1.4.0 - 2018-06-05 * Rework parsing code to make it more robust to variations in HTML format.

Version 1.3.1 - 2017-06-30 * Fix date parsing on Mac (thanks to Simon Naitram for signalling this).

Version 1.3 - 2016-06-29 * Support more variants of the format (though many likely remain unsupported).

Version 1.2 - 2015-02-22 * Support importation of English HTML files (thanks to Oriol Mirosa for sending an example file).

Version 1.1 - 2014-05-31 * Adapt to tm 0.6. * Change all tags to lowercase (for consistency with tm).

Version 1.0 - 2014-02-10 * Initial release with support for HTML files saved in French only.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.4.1 by Milan Bouchet-Valat, a year ago

Report a bug at

Browse source code at

Authors: Milan Bouchet-Valat [aut, cre] , Tom Nicholls [ctb]

Documentation:   PDF Manual  

Task views: Natural Language Processing

GPL (>= 2) license

Imports utils, NLP, tm, xml2, ISOcodes

Imported by R.temis.

Suggested by RcmdrPlugin.temis.

See at CRAN