Access to Abbyy Optical Character Recognition (OCR) API

Get text from images of text using Abbyy Cloud Optical Character Recognition (OCR) API. Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports. Get the results in a variety of formats including plain text and XML. To learn more about the Abbyy OCR API, see <>.

Build Status Appveyor Build status CRAN_Status_Badge codecov Research software impact Github Stars

Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.

The package provides access to the Abbyy Cloud OCR SDK API. Details about results of calls to the API can be found here.


To get the latest version on CRAN:


To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)

Using abbyyR

To get acquainted with some of the important functions, read the vignettes:

# Overview of the package
vignette("introduction", package = "abbyyR")
# some functions are used along with output
vignette("example", package = "abbyyR")
# how to scrape text from a folder of images
vignette("wiscads", package = "abbyyR")

The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.


Scripts are released under the MIT License.

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.


abbyyR 0.5.4

  • fixed bug in processTextField, region was not being passed to the querylist. see #8.

abbyyR 0.5.3

  • add region argument to processImage()

abbyyR 0.5.2

  • extensive linting. passes expect_lint_free

abbyyR 0.5.1

  • moved to ldply for coercing list to data.frame
  • improved documentation
  • moved to match.arg
  • making all returns visible

abbyyR 0.5.0

  • Pass more arguments (dots)
  • StringsAsFactors issues for getresults fixed

abbyyR 0.4.0

  • compareText moved to R package recognize
  • getAppInfo returns a data.frame
  • Minor improvements to documentation
  • listTasks checks date format, provides more examples
  • Unique rownames for listFinishedTasks df
  • Added progress bar for getResults()

abbyyR 0.3.0

  • getResults returns a data frame carrying local file paths after writing to disk
  • Simpler coercion to data frame for all lists of length 1, more standardized 'cats' for process functions
  • httr upgrade issues fixed
  • getResults accounts for the case when there are no finished tasks

abbyyR 0.2.3

  • check if file exists
  • fixed bug in getResults()
  • fixed checking env. tokens
  • More unit tests, better coverage
  • compareText has been deprecated. Part of another package (recognize) on GitHub.
  • getResults allows saving to memory

abbyyR 0.2.2

  • added basic test
  • processPhotoID is not completely supported by abbyy. Adjusted for that. Changed documentation.
  • getTaskStatus had a bug -- it is fixed now
  • Storing keys in environment than options
  • Took out the https link causing pandoc to break

abbyyR 0.2.1

  • Better error handling
  • Better internal organization of functions
  • Added Readme
  • Vignettes via knitr
  • runs pdf via qpdf

abbyyR 0.2.0

  • Improved How Authentication Information is Transmitted.
  • New Vignette
  • Download files via curl::curl_download rather than download.file
  • A function that gives you quality of OCR: compare human transcription to OCR output. String Distance.
  • Convenient functions to ocr a file.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.5.5 by Gaurav Sood, a year ago

Report a bug at

Browse source code at

Authors: Gaurav Sood [aut, cre]

Documentation:   PDF Manual  

Task views: Web Technologies and Services

MIT + file LICENSE license

Imports httr, XML, curl, readr, plyr, progress

Suggests testthat, rmarkdown, knitr, lintr

See at CRAN