Programmatic Conversion of PDF Tables

Allows the user to convert PDF tables to formats more amenable to analysis ('.csv', '.xml', or '.xlsx') by wrapping the PDFTables API. In order to use the package, the user needs to sign up for an API account on the PDFTables website (< https://pdftables.com/pdf-to-excel-api>). The package works by taking a PDF file as input, uploading it to PDFTables, and returning a file with the extracted data.


pdftables

The pdftables package allows the user to convert PDF tables to formats more amenable to analysis (csv, XLM, or XLSX) by wrapping the PDFTables API.

The package can be installed from either CRAN or Github (development version):

# From CRAN
install.packages("pdftables")
 
# From Github
library(devtools)
install_github("expersso/pdftables")
 
library(pdftables)

To use the package the user first needs to sign up to the PDFTables API to get an API token (they offer a free package that allows up to 50 pages).

In the following example we first write the iris dataset to a .csv file. We then open that file and print it as a .pdf file. Using the convert_pdf function we then upload that PDF to the PDFTables API which parses and returns the converted file as test2.csv.

(Note: All functions in the package require that you provide your api key in the api_key argument. By default this looks for an environment variable called pdftable_api, but you can also provide it directly.)

write.csv(head(iris, 20), file = "test.csv", row.names = FALSE)
 
# Open test.csv and print as PDF to "test.pdf"
 
convert_pdf("test.pdf", "test2.csv")
# Converted test.pdf to test2.csv

If the output_file argument is omitted, the name of the output file will be the same as the input file, but with the right file extension.

The package (and API) supports converting PDFs to .csv, .xml, and .xlsx.

Note that the conversion sometimes fails to recover the underlying data exactly, so you may have to open the retrieved file and make some manual corrections.

The get_remaining function shows you how many pages you have remaining.

get_remaining()

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("pdftables")

0.1 by Eric Persson, 2 years ago


https://www.github.com/expersso/pdftables , https://pdftables.com


Report a bug at https://www.github.com/expersso/pdftables/issues


Browse source code at https://github.com/cran/pdftables


Authors: Eric Persson [aut, cre]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


CC0 license


Imports httr, tools

Suggests knitr, rmarkdown


See at CRAN