This documentation refers to an earlier version of Islandora. https://wiki.duraspace.org/display/ISLANDORA/Start is current.

Documentation page for the Book Batch module.

Usage

In a collection that accepts BookCModel objects

Navigate to "Manage" and click "Collection," then "Book Batch."

Via Drush

The base ZIP/directory preprocessor can be called as a drush script (see drush help islandora_book_batch_preprocess for additional parameters):

 

Drush made the target parameter reserved as of Drush 7. To allow for backwards compatibility, this will be preserved. The parameter has been renamed scan_target.

 

Drush 7 and above:

drush -v --user=admin --uri=http://localhost islandora_book_batch_preprocess --type=zip --scan_target=/path/to/archive.zip

Drush 6 and below: 

drush -v --user=admin --uri=http://localhost islandora_book_batch_preprocess --type=zip --target=/path/to/archive.zip

This will populate the queue (stored in the Drupal database) with base entries.

See drush help islandora_book_batch_preprocess for a full list of parameters.

Setting up directory structure for processing

No matter which method you choose, books must be broken up into separate directories, such that each directory at the "top" level (in the target directory or Zip file) represents a book. Book pages are their own directories inside of each book directory.

Files are assigned to object datastreams based on their basename, so a folder structure like:

  • my_cool_book/
    • MODS.xml
    • 1/
      • OBJ.tif
      • MODS.xml
      • OCR.txt
    • 2/
      • OBJ.tif
      • MODS.xml
      • OCR.txt

would result in a two-page book with individual MODS records and text files.

Each page directory name will be used as the sequence number of the page created.

A file named --METADATA--.xml can contain either MODS, DC or MARCXML which is used to fill in the MODS or DC streams (if not provided explicitly). Similarly, --METADATA--.mrc (containing binary MARC) will be transformed to MODS and then possibly to DC, if neither are provided explicitly.

If no MODS is provided at the book level - either directly as MODS.xml, or transformed from either a DC.xml or the "--METADATA--" file discussed above - the directory name will be used as the title.

Web options

Options are available for processing include:

  • Page progressions (left-to-right or right-to-left)

  • Create PDF
  • Namespace (default is repository's default namesapce)
  • Generate OCR (run Tesseract on individual pages)
  • Aggregate OCR (place individually generated OCR pages into one text file, whose datastream will be listed with the book object itself .  This is helpful for searching and allows repository managers to disable individual page listings in Solr output.)
  • Notify admin after ingest (if Rules have been set)
  • Ingest immediately (ingest or delay queue for processing)

The queue of preprocessed items can then be processed:

drush -v --user=admin --uri=http://localhost islandora_batch_ingest

  • No labels