Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe ( http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.


News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("boilerpipeR")

1.3 by Mario Annau, 5 years ago


https://github.com/mannau/boilerpipeR


Report a bug at https://github.com/mannau/boilerpipeR/issues


Browse source code at https://github.com/cran/boilerpipeR


Authors: See AUTHORS file.


Documentation:   PDF Manual  


Task views: Natural Language Processing, Web Technologies and Services


Apache License (== 2.0) license


Imports rJava

Suggests RCurl


Imported by tm.plugin.webmining.


See at CRAN