Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Islandora Newspaper Batch module uses the Islandora batch framework to provide a command-line (Drushdrush) and GUI (Drupal interface) option for adding a batch file of newspaper issues and pages to an existing Islandora Newspaper object.

Batch-loading newspapers is a two-step process.

  1. Preprocessing: Drupal creates entries in the database for each object (issue and page) that will be added.
  2. Ingest: The data is ingested and derivatives are generated as part of the Islandora batch functions.

...

  • Newspaper Batch can only be used with an existing Newspaper object (islandora:newspaperCModel).Make sure the collection you're ingesting into has the Newspaper, Newspaper Issue, and Newspaper Page content models selected in the Manage > Collection tab.
    • Go to http://localhost:8000 and log in
    • Navigation > Islandora Repository 
      Image Added
    • Click on the Newspaper Collection
      Image Added
    • Click Manage tab 
      Image Added
    • Click Add an object to this Collection
      Image Added
    • Use the default content model Islandora Newspaper Content ModelImage Added
    • You can use a MARCXML to fill in the last page or click next (MARCXML file is not required at this step)
      Image Added
    • Title is the only "required" field at this stage
      Image Added
    • Click ingest and it should confirm your ingest
      Image Added
  • Newspaper Batch uses the value in the MODS dateIssued field on each issue to populate the issue browsing display for newspaper. The data in this field must be formatted as YYYY-MM-DD. If only YYYY is entered, the interface will use the current month and day for the issue.

...

Sample single-issue batch folder hierarchy

batch.zip
--issue01

...

└── issue1
    ├── 001
    │   └── OBJ.tif
    ├── 002
    │   └── OBJ.tif
    └── MODS.xml - this becomes the MODS record for the issue-level object
------001
----------OBJ.tiff
------002

...

 

Other files, with file names corresponding to datastream IDs, can be included in each page subfolder, such as JP2.jp2, OCR.txt, and TN.jpg. If derivatives are included, Islandora will not generate new derivatives, which speeds up ingest.

Sample batch folder hierarchy with derivatives

batch02.zip
--issue01
------MODS.xml - this becomes the MODS record for the issue-level object
------001
----------OBJ.tiff
----------JP2.jp2
----------OCR.txt
----------TN.jpg
----------MODS.xml
------002
----------OBJ.tiff
----------JP2.jp2
----------OCR.txt
----------TN.jpg

...

└── issue1
    ├── 1
    │   ├── JP2.jp2
    │   ├── JPG.jpeg
    │   ├── OBJ.tif
    │   ├── OCR.asc
    │   └── TN.jpeg
    ├── 2
    │   ├── JP2.jp2
    │   ├── JPG.jpeg
    │   ├── OBJ.tif
    │   ├── OCR.asc
    │   └── TN.jpeg
    └── MODS.xml

Descriptive Metadata

If MODS metadata is not available for issue or page objects, the following formats can be supplied and will be automatically transformed to general MODS and DC.

...

Newspaper Batch Ingest options

  • Got to Manage > Newspaper Batch
    Image Added

    Image Added

 Newspaper batch ingest options pageImage Removed

  • Zip file - Upload the ZIP file for batch ingest.
  • Create PDFs? - Checking this box creates a PDF derivative that contains all the pages associated with a newspaper issue.
  • Namespace for created objects - Set the namespace for the issue and page objects created for this batch ingest.
  • Generate OCR? - Checking this box causes OCR to be generated for each Page object. OCR will be attached as a datastream to each page.
  • Notify admin after ingest? - Check this box to send an email to the site admin (user 1) that a newspaper batch ingest has completed. This requires the Drupal Rules module and a rule for newspaper batch notifications.
  • Ingest immediately? - Checking this box will cause the batch to go through both steps of the ingest (pre-processing and actual ingest) immediately.
    • If you do not check "Ingest Immediately", the files will be pre-processed only and added to the Islandora batch queue for an administrator to approve.
    • To approve the batch, go to Administration > Reports > Islandora Batch Sets and select "View Items in Set" next to an unprocessed set. To process the set, click "Process Set" and process all items.
      Pre-processed newspaper batch items in the batch queueImage RemovedPre-processed newspaper batch items in the batch queueImage Added

Using Newspaper Batch from the command line (Drush)

If you have many ZIP files to ingest, or if the ZIP files are too large to ingest through the interface, you can also batch ingest newspapers from the Drupal command line with Drush.

To use the ZIP pre-processor from Drush:
 (see drush help islandora_newspaper_batch_preprocess for additional parameters):

drush -v --user=adminu --uri=http://localhost islandora_newspaper_batch_preprocess --type=directory --target=/path/to/issues --namespace=dailyplanet --parent=islandora:dailyplanet

This will populate the queue (stored in the Drupal database) with PID entries. Note that the --parent parameter must be a newspaper object, not a collection object.
You can then process all items in the batch queue:

drush -v --user=adminu --uri=http://localhost islandora_batch_ingest

Troubleshooting

You may get a warning. "Failed to get issued date from MODS for dailyplanet:1"<br/>

After ingesting everything looks normal but the "issue" you ingested is missing. 

Image Added

  • Click Manage > Newspaper
    Image Added
  • Give it a date to start publishing the article. It will help you with the date picker.
    Image Added
  • Confirmation!
    Image Added
  • Now you'll see a date on the Newspaper page
    Image Added

Additional Documentation

Further documentation for this module is available at the Islandora Newspaper Batch github repository.