...
- Go to Manage > Newspaper Batch
Image Modified
Image Removed
Image Added
- Zip file - Upload the ZIP file for batch ingest.
- Create PDFs? - Checking this box creates a PDF derivative that contains all the pages associated with a newspaper issue.
- Namespace for created objects - Set the namespace for the issue and page objects created for this batch ingest.
- Generate OCR? - Checking this box causes OCR to be generated for each Page object. OCR will be attached as a datastream to each page. If checked, another option appears below it, "Aggregate OCR?".
- Generate HOCR? - Checking this box causes HOCR to be generated for each Page object (text highlighting after full text search). HOCR will be attached as a datastream to each page.
- Aggregate OCR? Check this box to create an OCR datastream in the issue object that aggregates the OCR datastreams from all of the page-level objects in that issue.
- Notify admin after ingest? - Check this box to send an email to the site admin (user 1) that a newspaper batch ingest has completed. This requires the Drupal Rules module and a rule for newspaper batch notifications.
- Ingest immediately? - Checking this box will cause the batch to go through both steps of the ingest (pre-processing and actual ingest) immediately.
- If you do not check "Ingest Immediately", the files will be pre-processed only and added to the Islandora batch queue for an administrator to approve.
- To approve the batch, go to Administration > Reports > Islandora Batch Sets and select "View Items in Set" next to an unprocessed set. To process the set, click "Process Set" and process all items.
...
This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper title object, not an issue object or a collection object.
Here are the options in the drush command:
drush help islandora_newspaper_batch_preprocess
Preprocessed newspaper issues into database entries.
Options:
--aggregate_ocr A flag to cause OCR to be aggregated to issues, if OCR is also being generated per-page.
--content_models A comma-separated list of content models to assign to the objects. Only applies to the "newspaper issue"
level object.
--create_pdfs A flag to cause PDFs to be created in newspaper issues. Page PDF creation is dependant on the configuration
within Drupal proper.
--directory_dedup A flag to indicate that we should avoid repreprocessing newspaper issues which are located in directories.
--do_not_generate_ocr A flag to allow for conditional OCR generation.
--email_admin A flag to notify the site admin when the newspaper issue is fully ingested (depends on Rules being enabled).
--namespace The namespace for objects created by this command. Defaults to namespace set in Fedora config.
--parent The collection to which the generated items should be added. Only applies to the "newspaper issue" level
object. If "directory" and the directory containing the newspaper issue description is a valid PID, it will
be set as the parent. If this is specified and itself is a PID, all newspapers issue will be related to the
given PID. Required.
--parent_relationship_pred The predicate of the relationship to the parent. Defaults to "isMemberOf".
--parent_relationship_uri The namespace URI of the relationship to the parent. Defaults to
"info:fedora/fedora-system:def/relations-external#".
--target The target to directory or zip file to scan. Required.
--type Either "directory" or "zip". Required.
--wait_for_metadata A flag to indicate that we should hold off on trying to ingest newspaper issues until we have metadata
available for them at the newspaper issue level.
Aliases: inbp
Third, process all items in the batch queue:
...