Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Go to Manage > Newspaper Batch
    Image Modified
    Image Removed


    Image Added 
  • Zip file - Upload the ZIP file for batch ingest.
  • Create PDFs? - Checking this box creates a PDF derivative that contains all the pages associated with a newspaper issue.
  • Namespace for created objects - Set the namespace for the issue and page objects created for this batch ingest.
  • Generate OCR? - Checking this box causes OCR to be generated for each Page object. OCR will be attached as a datastream to each page. If checked, another option appears below it, "Aggregate OCR?".
  • Generate HOCR? - Checking this box causes HOCR to be generated for each Page object (text highlighting after full text search). HOCR will be attached as a datastream to each page.
  • Aggregate OCR? Check this box to create an OCR datastream in the issue object that aggregates the OCR datastreams from all of the page-level objects in that issue.
  • Notify admin after ingest? - Check this box to send an email to the site admin (user 1) that a newspaper batch ingest has completed. This requires the Drupal Rules module and a rule for newspaper batch notifications.
  • Ingest immediately? - Checking this box will cause the batch to go through both steps of the ingest (pre-processing and actual ingest) immediately.
    • If you do not check "Ingest Immediately", the files will be pre-processed only and added to the Islandora batch queue for an administrator to approve.
    • To approve the batch, go to Administration > Reports > Islandora Batch Sets and select "View Items in Set" next to an unprocessed set. To process the set, click "Process Set" and process all items.
      Pre-processed newspaper batch items in the batch queue

...

Here are the options in the drush command:
```


drush help islandora_newspaper_batch

...

_preprocess

...

Preprocessed newspaper issues into database entries.

Options:

...

 -

...

-aggregate_ocr                           A flag to cause OCR to be aggregated to issues, if OCR is also being generated per-page.                     
 --content_models                          A comma-separated list of content models to assign to the objects. Only applies to the "newspaper issue"     
                                           level object.                                                                                                
 --create_pdfs                             A flag to cause PDFs to be created in newspaper issues. Page PDF creation is dependant on the configuration  
                                           within Drupal proper.                                                                                        
 --directory_dedup                         A flag to indicate that we should avoid repreprocessing newspaper issues which are located in directories.   
 --do_not_generate_ocr                     A flag to allow for conditional OCR generation.                                                              
 --email_admin                             A flag to notify the site admin when the newspaper issue is fully ingested (depends on Rules being enabled).
 --namespace                               The namespace for objects created by this command.  Defaults to namespace set in Fedora config.              
 --parent                                  The collection to which the generated items should be added.  Only applies to the "newspaper issue" level    
                                           object. If "directory" and the directory containing the newspaper issue description is a valid PID, it will  
                                           be set as the parent. If this is specified and itself is a PID, all newspapers issue will be related to the  
                                           given PID. Required.                                                                                         
 --parent_relationship_pred                The predicate of the relationship to the parent. Defaults to "isMemberOf".                                   
 --parent_relationship_uri                 The namespace URI of the relationship to the parent. Defaults to                                             
                                           "info:fedora/fedora-system:def/relations-

...

external#".                                                         
 --

...

target                                  The target to directory or zip file to scan.

...

 Required.                                                       
 --type                                    Either "directory" or "zip". Required.                                                                       
 --wait_for_metadata                       A flag to indicate that we should hold off on trying to ingest newspaper issues until we have metadata       
                                           available for them at the newspaper issue level.

Aliases: inbp


Third, process all items in the batch queue:

...