Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework

Provides a 'tm' Source to create corpora from articles exported from the 'LexisNexis' content provider as HTML files. It is able to read both text content and meta-data information (including source, date, title, author and pages). Note that the file format is highly unstable: there is no warranty that this package will work for your corpus, and you may have to adjust the code to adapt it to your particular format.


News

Version 1.4.0 - 2018-06-05 * Rework parsing code to make it more robust to variations in HTML format.

Version 1.3.1 - 2017-06-30 * Fix date parsing on Mac (thanks to Simon Naitram for signalling this).

Version 1.3 - 2016-06-29 * Support more variants of the format (though many likely remain unsupported).

Version 1.2 - 2015-02-22 * Support importation of English HTML files (thanks to Oriol Mirosa for sending an example file).

Version 1.1 - 2014-05-31 * Adapt to tm 0.6. * Change all tags to lowercase (for consistency with tm).

Version 1.0 - 2014-02-10 * Initial release with support for HTML files saved in French only.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tm.plugin.lexisnexis")

1.4.1 by Milan Bouchet-Valat, a year ago


https://github.com/nalimilan/R.TeMiS


Report a bug at https://github.com/nalimilan/R.TeMiS/issues


Browse source code at https://github.com/cran/tm.plugin.lexisnexis


Authors: Milan Bouchet-Valat [aut, cre] , Tom Nicholls [ctb]


Documentation:   PDF Manual  


Task views: Natural Language Processing


GPL (>= 2) license


Imports utils, NLP, tm, xml2, ISOcodes


Imported by R.temis.

Suggested by RcmdrPlugin.temis.


See at CRAN