Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update import/export UI instructions for 7.4

...

  1. Create a separate file for the other schema named metadata_[prefix].xml, where the [prefix] is replaced with the schema's prefix.
  2. Inside the xml file use the same Dublin Core syntax, but on the <dublin_core> element include the attribute schema=[prefix].
  3. Here is an example for ETD metadata, which would be in the file metadata_etd.xml:

    Code Block
    <?xml version="1.0" encoding="UTF-8"?>
    <dublin_core schema="etd">
         <dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
         <dcvalue element="degree" qualifier="level">Masters</dcvalue>
         <dcvalue element="degree" qualifier="grantor">Michigan Institute of Technology</dcvalue>
    </dublin_core>


...

  • Resume. If, during importing, you have an error and the import is aborted, you can use the --resume (-R) flag to resume the import where you left off after you fix the error.

  • Specifying the owning collection on a per-item basis from the command line administration tool

    If you omit the -c flag, which is otherwise mandatory, the ItemImporter searches for a file named "collections" in each item directory. This file should contain a list of collections, one per line, specified either by their handle, or by their internal db id. The ItemImporter then will put the item in each of the specified collections. The owning collection is the collection specified in the first line of the collections file.

    If both the -c flag is specified and the collections file exists in the item directory, the ItemImporter will ignore the collections file and will put the item in the collection specified on the command line.

    Since the collections file can differ between item directories, this gives you more fine-grained control of the process of batch adding items to collections.

UI Batch Import

0 does not yet support
Warninginfo
title

Available in DSpace 7.

Batch Import via the UI is not available in DSpace 7.0. It is scheduled to be restored in a later 7.x release (currently 7.1), see DSpace Release 7.0 Status.  The below screenshots/process are outdated and will need updating once this feature is rebuilt in 7.x.

Batch import can also take place via the Administrator's UI. The steps to follow are:

4 and above.

Batch import can also take place via the Administrator's UI. The steps to follow are:

A.  Prepare A. Prepare the data

  1. Items, i.e. the metadata and their bitstreams, must be in the Simple Archive Format described earlier in this chapter. Thus, for each item there must be a separate directory that contains the corresponding files of the specific item.
  2. Moreover, in each item directory, there can be another file that describes the collection or the collections that this item will be added to. The name of this file must be "collections" and it is optional. It has the following format:


    Each line contains the handle of the collection. The collection in the first line is the owning collection while the rest are the other collections that the item should belong to.
  3. Compress the item directories into a zip ZIP file. Please note that you need to zip the actual item directories and directories and not just the directory that contains the item directorie s. Thus, the final zip file must directly contain the item directorie s.
    Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so. Since such a zip file can be very big in size, the batch import UI needs the URL to download it for a public location rather than just upload it and get a timeout exception

B. Import the B.  Import the items via the UI

  1. Login as an administratorAdministrator.
  2. Find In the menu on the top right of the page, and select the "Administer" option.
    Image Removed
    Select the side menu, select "Import" → "Batch Import " option from (ZIP)"
    Image Added

  3. From the "Content" drop down menu on the top of the page.
    Image Removed
  4. Fill in the form that appears as follows:

  • Field #1: select the type of the input data that you want to batch import. Be sure to select "Simple Archive Format" in this drop down menu.
  • Field #2: Copy/Paste the public URL where the zip file mentioned earlier is located.
  • Filed #3: Select the owning collection of the items you are importing. This field is optional, meaning that if you leave it empty, you must include per item collection information (via the "collections" file mentioned before) in the Simple Archive Format.
  • Field #4: Select the other collections the item will belong to. You can select more than one collection by just holding down the Ctrl key on your keyboard.  If you select the owning collection in this multiselect input control, it will be ignored at the very end.

          Image Removed

Comments:

1) If you select an owning collection from this form, then the "collections" file that may be included in the item will be ignored.

2) If you do not specify an owning collection, and for some items no "collections" file exists in the item directory, then the item will not be imported in DSpace

Finally, when you submit the form you will receive a message informing you that the import process is being executed in the background (since it may take long). At the end, you will receive a success or failure email (to the email address of your DSpace account) informing you of the status of the import.

C. View past batch imports (that have be done via the UI)

...

Image Removed

Moreover, the user can take the following actions:

Download the map file that was produced during the import. This file contains a list of items that were imported with the corresponding handle assigned to them by DSpace.

Delete the imported items. Everything that was imported will be deleted (including the history directory in the "[dspace]/import" directory)

In case of failure, the user can "Resume" the import. The user is taken to the upload form again, but the system recognizes the initial import (and the map file) in order to resume the old import. There is a red label in the form that informs the user about the "Resume" form.

  1. Import Batch" page: 
    1. Select the Collection you are importing into.
    2. Drag & drop the ZIP file into the drop box (or browse to it on your filesystem).
    3. Choose whether you want to "Validate Only" or not. 
      1. When selected, DSpace will test the batch import process, but no content will be batch imported.  This allows you to validate the results of the import process before doing the import. 
      2. When deselected, DSpace will do the batch import.

    Image Added
  2. Clicking "Proceed" will start the Batch Import.  This creates a new "Process" which begins the upload of the batch.  Depending on the size of the batch, this process may take some time to complete.  You can refresh to page to see the current status, or go back to the list of processes ("Processes" menu in sidebar) to check on its status.  Once the process is COMPLETED, you will see a log of the results and a mapfile (which can be used to make later updates).

  3. All prior imports will be listed in the "Processes" menu, until their corresponding process entry is deleted.  Once you are satisfied with the import and have no need to see the logs or mapfile, you may wish to delete that process entry in order to free up storage space (as your uploaded ZIP will be retained in DSpace until the process is deleted).   A "process-cleaner" script can also be started from the "Processes" page which can be used to bulk delete old processes.

Exporting Items

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive in the aforementioned format for each exported item. The items are exported in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item subdirectories (item_000, item_001) are not related to DSpace handle or item IDs.

Command used:

[dspace]/bin/dspace export

Java class:

org.dspace.app.itemexport.ItemExport

Arguments short and (long) forms:

Description

-t or --type

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

-i or --id

The ID or Handle of the Collection or Item to export.

-d or --dest

The destination path where you want the file of items to be placed.

-n or --number

Sequence number to begin with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export directory is the same as the layout used for import.

-m or --migrate

Export the item/collection for migration. This will remove the handle and any other metadata that will be re-created in the new instance of DSpace.

-x or --exclude-bitstreamsDo not export bitstreams.  See the usage scenario below.

-h or --help

Brief Help.

Exporting a Collection

The CLI command to export the items of a collection:

Code Block
[dspace]/bin/dspace export --type=COLLECTION --id=collectionID_or_handle --dest=/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t COLLECTION -i collectionID_or_handle -d /path/to/destination -n seq_num

The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply.

Exporting a Single Item

To export a single item use the keyword ITEM and give the item ID as an argument:

Code Block
[dspace]/bin/dspace export --type=ITEM --id=itemID_or_handle --dest=/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t ITEM -i itemID_or_handle -d /path/to/destination -n seq_num

Each exported item will have an additional file in its directory, named "handle". This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle.

The -m Argument

Using the -m argument will export the item/collection and also perform the migration step. It will perform the same process that the next section Exchanging Content Between Repositories performs. We recommend that section to be read in conjunction with this flag being used.

The -x Argument

Using the -x argument will do the standard export except for the bitstreams which will not be exported. If you have full SAF without bitstreams and you have the bitstreams archive (which might have been imported into DSpace earlier) somewhere near, you could symlink original archive files into SAF directories and have an exported collection which almost doesn't occupy any space but otherwise is identical to the exported collection (i.e. could be imported into DSpace). In case of huge collections -x mode might be substantially faster than full export.


UI Batch Import

Info

Available in DSpace 7.4 and above.

Batch export can also take place via the Administrator's UI. The steps to follow are:

  1. Login as an Administrator.
  2. In the side menu, select "Import" → "Batch Export (ZIP)"
    Image Added
  3. Select or search for the Collection to export from:
    Image Added
  4. Clicking "Export" will start the Batch Export.  This creates a new "Process" which begins export process.  Depending on the size of the export, this process may take some time to complete.  You can refresh to page to see the current status, or go back to the list of processes ("Processes" menu in sidebar) to check on its status.  Once the process is COMPLETED, you will see a log of the results and an exported ZIP file which you can download for the results.
  5. All prior exports will be listed in the "Processes" menu, until their corresponding process entry is deleted.  Once you are satisfied with the export and have downloaded the ZIP, you may wish to delete that process entry in order to free up storage space (as your exported ZIP will be retained in DSpace until the process is deleted).   A "process-cleaner" script can also be started from the "Processes" page which can be used to bulk delete old processes

Image Removed

Exporting Items

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive in the aforementioned format for each exported item. The items are exported in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item subdirectories (item_000, item_001) are not related to DSpace handle or item IDs.

...

Command used:

...

[dspace]/bin/dspace export

...

Java class:

...

org.dspace.app.itemexport.ItemExport

...

Arguments short and (long) forms:

...

Description

...

-t or --type

...

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

...

-i or --id

...

The ID or Handle of the Collection or Item to export.

...

-d or --dest

...

The destination path where you want the file of items to be placed.

...

-n or --number

...

Sequence number to begin with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export directory is the same as the layout used for import.

...

-m or --migrate

...

Export the item/collection for migration. This will remove the handle and any other metadata that will be re-created in the new instance of DSpace.

...

-h or --help

...

Brief Help.

Exporting a Collection

The CLI command to export the items of a collection:

Code Block
[dspace]/bin/dspace export --type=COLLECTION --id=collectionID_or_handle --dest=/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t COLLECTION -i collectionID_or_handle -d /path/to/destination -n seq_num

The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply.

Exporting a Single Item

To export a single item use the keyword ITEM and give the item ID as an argument:

Code Block
[dspace]/bin/dspace export --type=ITEM --id=itemID_or_handle --dest=/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t ITEM -i itemID_or_handle -d /path/to/destination -n seq_num

Each exported item will have an additional file in its directory, named "handle". This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle.

The -m Argument

Using the -m argument will export the item/collection and also perform the migration step. It will perform the same process that the next section Exchanging Content Between Repositories performs. We recommend that section to be read in conjunction with this flag being used.

The -x Argument

...

  1. .