Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Importing and Exporting Items via Simple Archive Format

Table of Contents
minLevel2
outlinetrue
stylenone

Item Importer and Exporter

...

The basic concept behind the DSpace's simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.

Code Block

archive_directory/
    item_000/
        dublin_core.xml         -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
        metadata_[prefix].xml   -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry
        contents                -- text file containing one line per filename
        file_1.doc              -- files to be added as bitstreams to the item
        file_2.pdf
    item_001/
        dublin_core.xml
        contents
        file_1.png
        ...

...

The _dublin_core.xml_ or _metadata_\[prefix\].xml_ file has the following format, where each metadata element has it's own entry within a _<dcvalue>_ tagset. There are currently three tag attributes available in the _<dcvalue>_ tagset:

  • <element> - the Dublin Core element
  • <qualifier> - the element's qualifier
  • <language> - (optional)ISO language code for element

    Code Block
    
    <dublin_core>
        <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
        <dcvalue element="date" qualifier="issued">1990</dcvalue>
        <dcvalue element="title" qualifier="alternatealternative" language="fr">J'aime les Printemps</dcvalue>
    </dublin_core>
    
    

    (Note the optional language tag attribute which notifies the system that the optional title is in French.)

Every metadata field used, must be registered via the metadata registry of the DSpace instance first.

The contents file simply enumerates, one file per line, the bitstream file names. See the following example:

Code Block

        file_1.doc
        file_2.pdf
        license

Please notice that the license is optional, and if you wish to have one included, you can place the file in the .../item_001/ directory, for example.

The bitstream name may optionally be followed by any of the following:

  • \tbundle:BUNDLENAME
  • \tpermissions:PERMISSIONS
  • \tdescription:DESCRIPTION
  • \tprimary:true

Where '\t' is the tab character.

'BUNDLENAME' is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle, ORIGINAL.unmigrated-wiki-markup

'PERMISSIONS' &nbsp;is text with the following format:&nbsp;-\[r\|w\] 'group  is text with the following format: -[r|w] 'group name'

'DESCRIPTION' is text of the files description.

Primary is used to specify the primary bitstream.

...

Configuring

...

metadata_

...

[prefix

...

].xml

...

for

...

Different

...

Schema

It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry.

...

  1. Create a separate file for the other schema named "metadata_metadata-\[prefix\].xml_", where the _\{prefix\}_ is replaced with the [prefix] is replaced with the schema's prefix.
  2. Inside the xml file use the dame Dublin Core syntax, but on the <dublin_core> element include the attribute "schema={[prefix}"].
  3. Here is an example for ETD metadata, which would be in the file

    "

    metadata_etd.xml

    "

    :

    Code Block
    <?xml version="1.0" encoding="UTF-8"?>
    <dublin_core schema="etd">
         <dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
         <dcvalue element="degree" qualifier="level">Masters</dcvalue>
         <dcvalue element="degree" qualifier="grantor">Texas A & M</dcvalue>
    </dublin_core>

...

Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances.

...

Command used:

[dspace]/bin/dspace import]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.itemimport.ItemImport

Arguments short and (long) forms:

Description

-a or --add

Add items to DSpace ‡

-r or --replace

Replace items listed in mapfile ‡

-d or --delete

Delete items listed in mapfile ‡

-s or --source

Source of the items (directory)

-c or --collection

Destination Collection by their Handle or database ID

-m or --mapfile

Where the mapfile for items can be found (name and directory)

-e or --eperson

Email of eperson doing the importing

-w or --workflow

Send submission through collection's workflow

-n or --notify

Kicks off the email alerting of the item(s) has(have) been imported

-t or --test

Test run‚ do not actually import items

-p or --template

Apply the collection template

-R or --resume

Resume a failed import (Used on Add only)

-h or --help

Command help

-z or --zip

Name of zipfile

‡ These are mutually exclusive.

The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments'

Adding Items to a Collection from a directory

To add items to a collection, you gather the following information:

...

Testing. You can add --test (or -t) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.

Adding Items to a Collection from a zipfile

To add items to a collection, you gather the following information:

  • eperson
  • Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
  • Source directory where your zipfile containing the items resides
  • Zipfile
  • Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
    At the command line:
Code Block
[dspace]/bin/dspace import --add --eperson=joe@user.com --collection=CollectionID --source=items_dir --zip=filename.zip --mapfile=mapfile

or by using the short form:

Code Block
[dspace]/bin/dspace import -a -e joe@user.com -c CollectionID -s items_dir -z filename.zip -m mapfile

The above command would unpack the zipfile, cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you can use it for replacing or deleting (unimporting) the file.

Testing. You can add --test (or -t) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.

Replacing Items in Collection

...

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8638f636-b052-4b5e-a615-1eeab135db5a"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace export

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.itemexport.ItemExport

Arguments short and (long) forms:

Description

-t or --type

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

-i or --id

The ID or Handle of the Collection or Item to export.

-d or --dest

The destination of where you want the file of items to be placed. You place the path if necessary.

-n or --number

Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export is the same as you would set your layout for an Import.

-m or --migrate

Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace.

-h or --help

Brief Help.

...

Code Block
[dspace]/bin/dspace export -t COLLECTION -di [CollID or Handle] -d /path/to/destination -n Some_number

...

Code Block
[dspace]/bin/dspace export -t ITEM -i [itemID or Handle] -d /path/to/destination -n some_number

...