Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: clarify Islandora Importer VS Islandora Batch on the top note section

...

Note

Before you can batch ingest objects, you will need to have downloaded and installed the Islandora Batch Importer moduleImporter module (web interface) or the Islandora Batch module (command line). If you want to batch ingest books, you will need to have downloaded and installed the Islandora Book Batch module; if you want to batch ingest newspaper issues, you will need to have downloaded and installed the Islandora Newspaper Batch module. It is also strongly encouraged that you review the mods_to_dc.xsl within the Islandora Book Batch module if you plan to ingest MODS metadata. Reviewing the mods_to_dc.xsl will help you to understand what type of Dublin Core will be produced by the mods_to_dc.xsl. For example, you may notice that the mods_to_dc.xsl will not produce clean Dublin Core subject tags - all individual MODS subject tags will be expressed as one Dublin Core subject tab. The mods_to_dc.xsl will also not map names tags if no roleTerm with a type attribute has been specified. The mods_to_dc.xsl is a Library of Congress XSLT and the Islandora community does not make modifications to this file. You are encouraged to make your own edits to the mods_to_dc.xsl if you need to modify the XSLT.

...

This page will run through the specifics of each one. In these examples, we will be batch-ingesting PDF files into a collection with the 'PDF Solution Pack' content model applied to its collection policy.

...

  1. Browse to the .zip file you would like to upload, and then click the 'Upload' button. It may take a while to move the file to the server.
  2. Choose the content models you would like to apply to the objects.

    Note

    All checked content models are applied to each object. So you can't mix different types of objects in one .zip file.


  3. Choose the namespace to be applied to the objects. ("Islandora" is given only as an example.)

  4. Click the 'Import' button to begin the batch import process.

...

After creating a .zip archive like above, you can simply follow the steps from the first example to ingest the batch into the repository.

Note
titleUploading multiple datastreams in the PDF content model

The PDF solution pack has an option to allow/disallow users to upload text files with PDFs for index into Solr. Note that this option applies to loading individual objects, not for batch processing: batch uploading PDF content model objects with multiple datastreams (such as uploading an object's PDF, XML, and full-text files through the zip importer) will effectively ignore this checkbox.


Info

Not every file in the archive needs to have corresponding datastreams. You could potentially upload an archive that contains some objects without metadata, some objects with only metadata, and some objects with both.

...

Files are assigned to object datastreams based on their basename, so a folder structure like:

  • my_cool_book/
    • MODS.xml
    • 1001/
      • OBJ.tiff
    • 2002/
      • OBJ.tiff

The above would result in a two-page book.

...

If no MODS is provided at the book level - either directly as MODS.xml, or transformed from either a DC.xml or the "--METADATA--" file discussed above - the directory name will be used as the title.

Text files for individual pages can also be supplied to provide a plain-text representation of the materials.  For example, handwritten items can have a transcribed text file uploaded in the batch process as --TEXTFILE--.txt.

Anchor
Newspaper batch ingest
Newspaper batch ingest

Batch Ingest Newspapers

When batch ingesting newspapers, you must already have an existing newspaper-level object. Each ingest folder contains folders that represent issues of the newspaper, and each issue directory contains folders that represent separate page images. 

For sample directory structures and configuration options, see the Newspaper Batch Ingest instructions.


Anchor
Batch Ingest Cleanup
Batch Ingest Cleanup

Batch Ingest Cleanup

Islandora creates detailed reports for each Batch Ingest. 

These reports can be very helpful for debugging and tracking, but they also take up hard drive space on your Islandora server.

  • The easiest way to find these reports is to click on Reports.
  • Image Added
  • Click on the link for the report you want.

  • Islandora Batch Ingest Queue
  • Note that the first row has SET ID 4.
  • Image Added


  • Islandora Batch Ingest Sets 
  • Note that the creator of each Batch Set is identified so your site admin can prod you to clean up your stuff!
  • In this report, SET ID 4 is at the bottom.
  • Image Added

  • The easiest way to clean out these files is to use the gui provided on the Islandora Batch Ingest Sets report.
  • First click on the dropdown menu for the row you wish to delete. (Here, the row containing SET ID 4.)
  • Image Added
  • Although there is an option for Delete set, the most prudent action is to click on View items in set to verify which Batch Set you have chosen.


  • Here I have chosen to View items for Set 4 which is the last Set in the Islandora Batch Set report shown above.
  • Image Added
  • This brings up the Set 4 Batch Queue with a link for Delete set.
  • Note that the SET ID 4 is in the only row above.
  • Click on the link for Delete set.
  • Image Added
  • Islandora gives you one more chance to change your mind.


  • Upon clicking the Confirm Button, you return to the Islandora Batch Ingest Sets page. Note the message about the Deleted item.
  • Set ID 4 is no longer on the report.
  • Three of these sets belong to someone else, so I only have 9 to go.
  • Image Added

  • Returning to the Islandora Batch Ingest Queue from Reports, the report shows the SET ID numbers in ascending order.
  • SET ID 4 no longer appears in the Islandora Batch Ingest Queue.

  • Image Added
  • This is the entire process for deleting a single Batch Set.