Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.



  • Import new qpdf package with pdf transformation tools
  • Enable pdf_data() in poppler 0.71 now Debian has backported the encoding patch.
  • Document new PPA for Ubuntu 16.04 and 18.04 with poppler 0.74


  • Windows / MacOS: update poppler to 0.73.0
  • Remove code that used the 'unstable' xpdf api
  • Use unique_ptr objects to fix memory leaks


  • Windows, MacOS: update poppler to 0.72.1 (backported UTF-8 fix)
  • Enable pdf_data() on systems with 0.72.1 or newer
  • Several encoding related fixes in text/metadata extraction
  • Add a 'tibble' class to data frames for pretty printing


  • Run configure script in bash


  • Change autobrew script to avoid dependency on xQuartz


  • pdf_render_page() and pdf_convert() gain argument to control 'antialias'
  • Small tweaks in pdf_text() for dealing with malformed pdf files


  • On Windows and MacOS we now bundle poppler-data to support non-latin text
  • Windows: Upgrade libpoppler to 0.61.0 from rwinlib
  • Windows: patch libpoppler bug that would cause pdf_convert() to generate corrupt files
  • PDF rendering errors are relayed via message() instead of warning()


  • Hide symbols in supported platforms
  • Upgrade libpoppler on Windows


  • Improve support for reading passworded and encyrpted pdf files (+ unit tests)
  • Support direct conversion from pdf to png, jpeg, tiff (+ unit tests)
  • Switch to Rcpp automatic symbol registration
  • Tweak autobrew script for legacy Mavericks builds


  • Fix autobrew for OSX Mavericks


  • Extract autobrew script to separate repo


  • Add workaround for poppler landscape truncation bug (fixes #7)


  • Rebuild poppler on Windows to support PDF rendering


  • Update Homebrew URL in configure script.
  • Fix autobrew (rename libopenjepg -> libopenjp2)
  • Update libpoppler 0.46 for Windows


  • Update libpoppler 0.42 for Windows
  • Use the COMPILED_BY variable on Windows to support R 3.3


  • Switch pdf_render_page to 1 based indexing
  • Fix for red/blue channel mixup in pdf_render_page
  • Update example to use local PDF file


  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2.3 by Jeroen Ooms, 3 months ago

https://docs.ropensci.org/pdftools (website) https://github.com/ropensci/pdftools#readme (devel) https://poppler.freedesktop.org (upstream)

Report a bug at https://github.com/ropensci/pdftools/issues

Browse source code at https://github.com/cran/pdftools

Authors: Jeroen Ooms [aut, cre]

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports Rcpp, qpdf

Suggests jpeg, png, webp, tesseract, testthat

Linking to Rcpp

System requirements: Poppler C++ API: libpoppler-cpp-dev (deb) or poppler-cpp-devel (rpm). The unit tests also require the 'poppler-data' package (rpm/deb)

Imported by TextForecast, crminer, findR, fulltext, pdfsearch, rcoreoa, readtext, speech, tesseract, textreadr.

Suggested by LexisNexisTools, goldi, gridGraphics, hunspell, magick, pagedown, slickR, spelling, staplr, texPreview, tm.

See at CRAN