Get text from images of text using Captricity Optical Character Recognition (OCR) API. Captricity allows you to get text from handwritten forms --- think surveys --- and other structured paper documents. And it can output data in form a delimited file keeping field information intact. For more information, read https://shreddr.captricity.com/developer/overview/.
OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using
compare_txt from recognize.
To get the latest version on CRAN:
To get the current development version from GitHub:
install.packages("devtools")devtools::install_github("soodoku/captr", build_vignettes = TRUE)
Read the vignette:
vignette("using_captr", package = "captr")
or follow the overview below.
Start by getting an application token and setting it using:
Then, create a batch using:
Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.
Next, assign the template ID to a batch:
Next, upload image(s) to a batch
Next, check whether the batch is ready to be processed:
You may also want to find out how much would processing the batch set you back by:
Once you are ready, submit the batch:
Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from
the list that is returned from
submit_batch. The field name is
To track progress of a job, use:
List all forms (instance sets) associated with a job:
If you want to download data from a particular form, use the
list_instance_sets to get the form (instance_set) id and run:
Get csv of all your results from a job:
Scripts are released under the MIT License.