For larger collections, Islandora is able to pull multiple files out of a zipped archive and ingest them into Fedora as a batch. There are a few ways that this can be done. You can upload .zip archives full of:
This page will run through the specifics of each one. In these examples, we will be batch-ingesting PDF files into a collection with the 'PDF Solution Pack' content model applied to its collection policy.
Ingest zipped content
1. Create a .zip archive with your files in it
The process for doing this will vary from operating system to operating system, but on PC, Mac and Linux at least, a .zip archive can be made in your file browser by highlighting the file or files you would like to zip, right-clicking, and finding an option similar to 'compress', 'create archive', 'create zipped folder', and so on.
In our example, opening the zipped archive shows our PDFs grouped together:
2. Navigate to the destination collection and click 'Manage'
This will take you to the collection's management page.
This will take you to that collection's options page.
4. Click on 'Batch Import Objects'
This will start the batch import process.
The Islandora Batch Importer module comes with several different modules for handling different types of content. If more than one is enabled, you will need to select the correct one.
In this case, we will be importing objects from a .zip file, so we are going to select that option.
6. Choose the correct options for your batch import
There are a few options on this screen that will need to be set up:
- Browse to the .zip file you would like to upload, and then click the 'Upload' button. It may take a while to move the file to the server.
Choose the content models you would like to apply to the objects.
Choose the namespace to be applied to the objects. ("Islandora" is given only as an example.)
- Click the 'Import' button to begin the batch import process.
This will import all the files from your zipped archive during which new Fedora objects are created and associated with the specified collection.
If you wish to ingest objects as simply metadata without a file datastream attached, you may do so by using the batch importer to import a .zip file full of XML forms.
In our example, a .zip file full of XML files has been created. Once the metadata files are fully ingested, PDF files can be added to the objects as datastreams.
In this case, we can simply follow the same steps as in the previous example to perform the batch ingest.
To create the XML files, you can either design them manually in a text editor or XML editor or use the Form Builder built into Islandora.
To use the Form Builder to create XML records, navigate to http://path.to.your.site/admin/islandora/xmlform, find the type of form you would like to fill out, and click the 'view' link beside it:
In this case, we will be creating an XML file for a PDF.
Fill out the form with the metadata values you would like, and click the 'Submit' button at the bottom of the form. This will output raw XML to your browser that you can then paste into a text editor, similar to the following:
Save this output as an XML file, and add it to your .zip archive.
Zip archives can contain both the content files to be ingested and the metadata files at the same time. This can be accomplished in the same way as the first example, with a few changes to the .zip archive itself:
In the above example, you will notice that each PDF file has a corresponding XML file, and that the filenames of the PDFs and the XML files are identical in every way – including capitalization – except for the extension. Files with matching filenames will be ingested together into the same object.
After creating a .zip archive like above, you can simply follow the steps from the first example to ingest the batch into the repository.
Batch Ingest Books
Books must be broken up into separate directories, such that each directory at the "top" level (in the target directory or Zip file) represents a book. Book pages are their own directories inside of each book directory.
Files are assigned to object datastreams based on their basename, so a folder structure like:
The above would result in a two-page book.
Each page directory name will be used as the sequence number of the page created.
A file named --METADATA--.xml can contain either MODS, DC or MARCXML which is used to fill in the MODS or DC streams (if not provided explicitly). Similarly, --METADATA--.mrc (containing binary MARC) will be transformed to MODS and then possibly to DC, if neither are provided explicitly.
If no MODS is provided at the book level - either directly as MODS.xml, or transformed from either a DC.xml or the "--METADATA--" file discussed above - the directory name will be used as the title.
Text files for individual pages can also be supplied to provide a plain-text representation of the materials. For example, handwritten items can have a transcribed text file uploaded in the batch process as --TEXTFILE--.txt.
Batch Ingest Newspapers
When batch ingesting newspapers, you must already have an existing newspaper-level object. Each ingest folder contains folders that represent issues of the newspaper, and each issue directory contains folders that represent separate page images.
For sample directory structures and configuration options, see the Newspaper Batch Ingest instructions.
Batch Ingest Cleanup
Islandora creates detailed reports for each Batch Ingest.
These reports can be very helpful for debugging and tracking, but they also take up hard drive space on your Islandora server.
- The easiest way to find these reports is to click on Reports.
- Click on the link for the report you want.
- Islandora Batch Ingest Queue
- Note that the first row has SET ID 4.
- Islandora Batch Ingest Sets
- Note that the creator of each Batch Set is identified so your site admin can prod you to clean up your stuff!
- In this report, SET ID 4 is at the bottom.
- The easiest way to clean out these files is to use the gui provided on the Islandora Batch Ingest Sets report.
- First click on the dropdown menu for the row you wish to delete. (Here, the row containing SET ID 4.)
- Although there is an option for Delete set, the most prudent action is to click on View items in set to verify which Batch Set you have chosen.
- Here I have chosen to View items for Set 4 which is the last Set in the Islandora Batch Set report shown above.
- This brings up the Set 4 Batch Queue with a link for Delete set.
- Note that the SET ID 4 is in the only row above.
- Click on the link for Delete set.
- Islandora gives you one more chance to change your mind.
- Upon clicking the Confirm Button, you return to the Islandora Batch Ingest Sets page. Note the message about the Deleted item.
- Set ID 4 is no longer on the report.
- Three of these sets belong to someone else, so I only have 9 to go.
- Returning to the Islandora Batch Ingest Queue from Reports, the report shows the SET ID numbers in ascending order.
- SET ID 4 no longer appears in the Islandora Batch Ingest Queue.
- This is the entire process for deleting a single Batch Set.