Analyse Text Documents Using Ecological Tools

A set of functions to analyse and compare texts, using classical text mining functions, as well as those from theoretical ecology.


Purpose of the package

The inpdfr package allows analysing and comparing PDF and/or TXT documents using both classical text mining tools and those from theoretical ecolgy. In the later, words are considered as species and documents as communities, therefore allowing analysis at the community and metacommunity levels.

How to use the package

Gather some PDF and/or TXT files in a folder. Pointing the working directory to this folder, inpdfr package will extract the text and produce a word occurrence data.frame which will be used to analyse and compare documents. An easy way to start is to use the RGtk2 GUI through the loadGUI function (only available on the gitHub version, not on CRAN).

Installation instructions

The package uses XPDF (http://www.foolabs.com/xpdf/download.html) for PDF to text extraction. You need to install XPDF before using inpdfr package. Depending on your operating system, you may need to restart your computer after installing XPDF. If you do not want to use XPDF, you can extract the content of your PDF files with the method of your choice and then store the content in TXT files. The only function making use of XPDF is getPDF which can be substituted with the getTXT function. install.packages("inpdfr")

Overview

The inpdfr package provides three cathegories of functions:

  • functions to extract and process text into a word-occurrence data.frame,
  • functions to analyse the word-occurrence data.frame with standard and ecological tools, and
  • functions to use inpdfr through a GTk2 Graphical User Interface. Further instructions and a complete example are provided in vignette.

News

package inpdfr v0.1.8

  • fix calling if() with a vector of length 2 or more

package inpdfr v0.1.7

  • fix calling if() with a vector of length 2 or more

package inpdfr v0.1.6 (bug-fix release)

  • fix WARN "working directory change"
  • GUI using RGtk2 no longer part of the package

package inpdfr v0.1.5

  • fix bug in doMetacomMetacom(wordF = wordOccuDF)

package inpdfr v0.1.4

  • identifyStructure function included in the package

package inpdfr v0.1.3

  • renamed arguments for R package metacom, following new release

package inpdfr v0.1.2

  • DESCRIPTION file allows package parallel (>= 3.1.3) to pass "r-oldrel-windows-ix86+x86_64" test from "CRAN Package Check Results"
  • R.devices package used to export output files
  • code syntax improvements
  • examples improvements

package inpdfr v0.1.1

  • URL fixed to download GTK+ in man/loadGUI.Rd
  • NOTE "Strong dependencies not in mainstream repositories: Rstem" fixed. Word stem is now performed with the SnowballC package.
  • missing @export roxygen field added for function getwordOccuDF
  • missing @export roxygen field added for function getAllAnalysis

package inpdfr v0.1.0

  • initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("inpdfr")

0.1.8 by Rebaudo Francois, 6 months ago


https://github.com/frareb/inpdfr/


Report a bug at https://github.com/frareb/inpdfr/issues


Browse source code at https://github.com/cran/inpdfr


Authors: Rebaudo Francois (IRD , UMR EGCE , Univ.ParisSud-CNRS-IRD- Univ.ParisSaclay)


Documentation:   PDF Manual  


GPL-2 license


Imports wordcloud, RColorBrewer, tm, SnowballC, ca, cluster, entropart, metacom, parallel, stringi, R.devices

Suggests knitr, rmarkdown, testthat

System requirements: XPDF (http://www.foolabs.com/xpdf/download.html)


See at CRAN