This documentation refers to an earlier version of Islandora. https://wiki.duraspace.org/display/ISLANDORA/Start is current.

Overview

The Islandora OCR module integrates Tesseract into the Islandora Paged Content module. It allows for creation of OCR and HOCR derivatives that can be appended to a page as a datastream. Check the instructions for the OCR-compatible module you wish to use for specifics on how to create OCR derivatives.

Dependencies

Downloads

Release Notes and Downloads

Configuration

Configuration options for the Islandora OCR module can be found at http://path.to.your.site/admin/islandora/ocr, and include the following options:

  • Tesseract: Islandora OCR requires the path to your Tesseract binary to function correctly. It also requires Tesseract to be version 3.02.02 or higher to function correctly.
  • Languages available for OCR: Islandora can look for any additional OCR languages you have installed; these are chosen from a drop-down menu at time of ingest or derivative creation.

It is recommended to check the Tesseract page for more information on these options.

  • No labels