Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: cleanup of text around OAI-ORE harvesting

...

Code Block
  <link href="./static/mystyle.css" rel="stylesheet" type="text/css"/>
  <img src="./static/images/static-image.gif" alt="Static image in /static/images/ directory"/>
  <img src="./static/static-image.jpg" alt="Static image in /static/ directory"/> 

...

Harvesting Items from XMLUI via OAI-ORE

...

or OAI-PMH

This feature allows you to harvest Items (both metadata and bitstreams) from one DSpace to another DSpace. However, both DSpace instances must be running XMLUI. In addition, the source DSpace (from which you are harvesting) should be running an OAI-PMH server (as it is used by default to harvest the metadata).

This section will give the necessary steps to set up the OAI-ORE Harvester usig Manakin/OAI-PMH Harvester from the XMLUI (Manakin).

Setting up a collection (Harvesting Collection Edit Screen):

  1. Login to XMLUI and create a new collection.
  2. Go to the tab named "Content Source" that now appears next to "Edit Metadata" and "Assign Roles " in the collection edit screens.
  3. The two counter source "Content Source" options are standards "standard DSpace collection" (selected by default) and harvested"collection harvests its content from an external source". Select "harvests from an external source" option and click Save.
  4. A new set of menus appear to configure the harvesting settings:
    • "OAI ProvideProvider" is in the URL of the OAI-PMH provider that the content from this collection should be harvested from. The OAI-PMH provider deployed with DSpace typically has the formformat: "http://dspace.url/oai/reuqest". requestImage Added For this example, you could use the Demo DSpace OAI-PMH provider: "http://web01demo.librarydspace.tamu.eduorg/oai-h151/request"
    • "OAI Set idId" is the OAI-PMH setSpec of the collection you wish to harvest from. Use "hdl_1969.1_5671" for this example.For DSpace, this Set ID has the format: hdl_<handle-prefix>_<handle-suffix>. For example "hdl_10673_2" would refer to the Collection whose handle is "10673/2" (on the DSpace Demo Server, this is the Collection of Sample Items)
    • "Metadata format" determines the format that the descriptive metdata metadata will be harvested. Since DSpace stores metadata in its own internal format, not all metadata values might bet harvested if a specific format is specifiedThe OAI-PMH server of the source DSpace instance may only support certain metadata formats. Select "DSpace Intermediate Metadata" if available (as this provides the richest metadata transfer) and "Simple Dublin Core" otherwise.
    • Click the "Test Settings" button will to verify the settings supplied in the previous steps and . This will usually let you know what, if anything is missing or does not match up.validate correctly. If you receive an error, you will need to fix the settings before continuing
  5. The list of radio buttons labeled "Content being harvested" allows you to select the level of harvest level. The first one requires no OAI-ORE support on the part of the provider and can be used to harvest metadata from any provider compliant with the OAI-PMH 2.0 specifications. The middle options will harvest the metadata and generate links to bitstreams stored remotely, while the last one will perform full local replication. Select the middle option . These harvesting options include:
    • Harvest Metadata Only - will only harvest item metadata from the source DSpace (or any OAI-PMH source)
    • Harvest metadata and references to bitstreams (requires ORE support) - will harvest item metadata and create links to files/bitstreams (stored remotely) from the source DSpace (requires OAI-ORE)
    • Harvest metadata and bitstreams (requires ORE support) - performs a full local replication. Harvests both item metadata and files/bitstreams (requires OAI-ORE).
  6. Select the appropriate option based on your needs, and click Save

At this point the settings are saved and the menu changes to provide three options:

  • "Change Settings" takes you back to the edit screen.
  • "Import Now" performs a single harvest from the remote collection into the local one. Success, notes, and errors encountered in the process will be reflected in the "Last Harvest Result" entry. More detailed information is available in the DSpace log. Note that the whole harvest cycle is executed within a single HTTP request and will time out for large collections. For this reason, it is advisable to use the automatic harvest scheduler set up
    either in XMLUI or from the command line. If the scheduler is running, "Import Now" will handle the harvest task as a separate thread.
  • "Reset and Reimport Collection" will perform the same function as "Import Now", but will clear the collection of all existing items before doing so.

...

  • A new table, Harvesting, has been added under "Administrative > Control Panel" in XMLUI.
  • The panel offers the following information:
    • Available actions:
      • Start Harvester : starts the scheduler. From this point on, all properly configured collections (listed on the next line) will be harvested at regular intervals. This interval can be changed in the dspace.cfg using the "harvester.harvestFrequency" parameter.
      • Pause : the "nice" stop; waits for the active harvests to finish, saves the state/progress and pauses execution. Can be either resumed or stopped.
      • Stop : the "full stop"; waits for the current item to finish harvesting, and aborts further execution.
      • Reset Harvest Status : since stopping in the middle of a harvest is likely to result in collections getting "stuck" in the queue, the button is available to clear all states.

...