Get text from images of text using Abbyy Cloud Optical Character Recognition (OCR) API. Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports. Get the results in a variety of formats including plain text and XML. To learn more about the Abbyy OCR API, see < http://ocrsdk.com/>.
Easily OCR images, barcodes, forms, documents with machine readable zones, e.g. passports, right from R. Get the results in a wide variety of formats, from text files to detailed XMLs with information about bounding boxes, etc.
To get the latest version on CRAN:
To get the current development version from GitHub:
# install.packages("devtools")devtools::install_github("soodoku/abbyyR", build_vignettes = TRUE)
To get acquainted with some of the important functions, read the vignettes:
# Overview of the packagevignette("introduction", package = "abbyyR")# some functions are used along with outputvignette("example", package = "abbyyR")# how to scrape text from a folder of imagesvignette("wiscads", package = "abbyyR")
The final output quality varies by complexity of the layout to resolution to font face etc. To measure the final quality of ocr, you can measure the edit distance to `gold standard' coded sample using recognize. To do quick edit distance based search and replace to fix messy data, you can use turbo search and replace.
Scripts are released under the MIT License.