Access to Abbyy Optical Character Recognition (OCR) API

Get text from images of text using Abbyy Cloud Optical Character Recognition (OCR) API. Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports. Get the results in a variety of formats including plain text and XML. To learn more about the Abbyy OCR API, see < http://ocrsdk.com/>.


Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.

The package provides access to the Abbyy Cloud OCR SDK API. Details about results of calls to the API can be found here.

To get the latest version on CRAN:

install.packages("abbyyR")

To get the current development version from GitHub:

# install.packages("devtools")
devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)

To get acquainted with some of the important functions, read the vignettes:

# Overview of the package
vignette("introduction", package = "abbyyR")
# some functions are used along with output
vignette("example", package = "abbyyR")
# how to scrape text from a folder of images
vignette("wiscads", package = "abbyyR")

The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.

Scripts are released under the MIT License.

News

abbyyR .4

  • compareText moved to R package recognize
  • getAppInfo returns a data.frame
  • Minor improvements to documentation
  • listTasks checks date format, provides more examples
  • Unique rownames for listFinishedTasks df
  • Added progress bar for getResults()

abbyyR .3

  • getResults returns a data frame carrying local file paths after writing to disk
  • Simpler coercion to data frame for all lists of length 1, more standardized 'cats' for process functions
  • httr upgrade issues fixed
  • getResults accounts for the case when there are no finished tasks

abbyyR .2.3

  • check if file exists
  • fixed bug in getResults()
  • fixed checking env. tokens
  • More unit tests, better coverage
  • compareText has been deprecated. Part of another package (recognize) on GitHub.
  • getResults allows saving to memory

abbyyR .2.2

  • added basic test
  • processPhotoID is not completely supported by abbyy. Adjusted for that. Changed documentation.
  • getTaskStatus had a bug -- it is fixed now
  • Storing keys in environment than options
  • Took out the https link causing pandoc to break

abbyyR .2.1

  • Better error handling
  • Better internal organization of functions
  • Added Readme
  • Vignettes via knitr
  • runs pdf via qpdf

abbyyR .2

  • Improved How Authentication Information is Transmitted.
  • New Vignette
  • Download files via curl::curl_download rather than download.file
  • A function that gives you quality of OCR: compare human transcription to OCR output. String Distance.
  • Convenient functions to ocr a file.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("abbyyR")

0.5.1 by Gaurav Sood, 7 months ago


http://github.com/soodoku/abbyyR


Report a bug at http://github.com/soodoku/abbyyR/issues


Browse source code at https://github.com/cran/abbyyR


Authors: Gaurav Sood [aut, cre]


Documentation:   PDF Manual  


Task views: Web Technologies and Services


MIT + file LICENSE license


Imports httr, XML, curl, readr, plyr, progress

Suggests testthat, rmarkdown, knitr


See at CRAN