This documentation refers to an earlier version of Islandora. is current.

Skip to end of metadata
Go to start of metadata


The Islandora OCR module integrates Tesseract into the Islandora Paged Content module. It allows for creation of OCR and HOCR derivatives that can be appended to a page as a datastream. Check the instructions for the OCR-compatible module you wish to use for specifics on how to create OCR derivatives.



Release Notes and Downloads


Configuration options for the Islandora OCR module can be found at, and include the following options:

  • Tesseract: Islandora OCR requires the path to your Tesseract binary to function correctly. It also requires Tesseract to be version 3.02.02 or higher to function correctly.
  • Languages available for OCR: Islandora can look for any additional OCR languages you have installed; these are chosen from a drop-down menu at time of ingest or derivative creation.

It is recommended to check the Tesseract page for more information on these options.

  • No labels