Page History
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Item Importer and Exporter
DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace simple archive format. The tools are not terribly robust, but are useful and are easily modified. They also give a good demonstration of how to implement your own item importer if desiredSimple Archive Format. Apart from the offered functionality, these tools serve as a prime example for users who aim to implement their own item importer.
DSpace Simple Archive Format
The basic concept behind the DSpace's simple archive format Simple Archive Format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.
Code Block |
---|
archive_directory/
item_000/
dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
metadata_[prefix].xml -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry
contents -- text file containing one line per filename
file_1.doc -- files to be added as bitstreams to the item
file_2.pdf
item_001/
dublin_core.xml
contents
file_1.png
...
|
...
The {{dublin_core.xml
}} or {{metadata_
\[prefix
\].xml
}} file has the following format, where each metadata element has it's own entry within a {{<dcvalue>
}} tagset. There are currently three tag attributes available in the {{<dcvalue>
}} tagset:
<element>
- the Dublin Core element<qualifier>
- the element's qualifier<language>
- (optional)ISO language code for elementCode Block <dublin_core> <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue> <dcvalue element="date" qualifier="issued">1990</dcvalue> <dcvalue element="title" qualifier="alternatealternative" language="fr">J'aime les Printemps</dcvalue> </dublin_core>
(Note the optional language tag attribute which notifies the system that the optional title is in French.)
Every metadata field used, must be registered via the metadata registry of the DSpace instance first.
The contents
file simply enumerates, one file per line, the bitstream file names. See the following example:
Code Block |
---|
file_1.doc
file_2.pdf
license
|
...
'BUNDLENAME' is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle, ORIGINAL.
'PERMISSIONS' is text with the following format: {{\-\[r\|w\] 'group name'}} is text with the following format: Wiki Markup -[r|w] 'group name'
'DESCRIPTION' is text of the files description.
Primary is used to specify the primary bitstream.
...
Configuring
...
metadata_
...
[prefix
...
].xml
...
for
...
Different
...
Schema
It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry.
...
- Create a separate file for the other schema named {{
metadata_
\[prefix
\].xml
}}, where the {{\[prefix
\]
}} is replaced with the schema's prefix.unmigrated-wiki-markup - Inside the xml file use the dame Dublin Core _syntax_, but on the {{
<dublin_core>
}} element include the attribute {{schema=
\[prefix
\]
}}. Here is an example for ETD metadata, which would be in the file
metadata_etd.xml
:Code Block <?xml version="1.0" encoding="UTF-8"?> <dublin_core schema="etd"> <dcvalue element="degree" qualifier="department">Computer Science</dcvalue> <dcvalue element="degree" qualifier="level">Masters</dcvalue> <dcvalue element="degree" qualifier="grantor">Texas A & M</dcvalue> </dublin_core>
...
Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances.
...
Command used: |
| ]]></ac:plain-text-body></ac:structured-macro> |
Java class: |
| |
Arguments short and (long) forms: | Description | |
| Add items to DSpace ‡ | |
| Replace items listed in mapfile ‡ | |
| Delete items listed in mapfile ‡ | |
| Source of the items (directory) | |
| Destination Collection by their Handle or database ID | |
| Where the mapfile for items can be found (name and directory) | |
| Email of eperson doing the importing | |
| Send submission through collection's workflow | |
| Kicks off the email alerting of the item(s) has(have) been imported | |
| Test run‚ do not actually import items | |
| Apply the collection template | |
| Resume a failed import (Used on Add only) | |
| Command help | |
| Name of zipfile |
...
Code Block |
---|
[dspace]/bin/dspace import -e joe@user.com -d -m mapfile |
In long form:
Code Block |
---|
[dspace]/bin/dspace import --eperson=joe@user.com --delete --mapfile mapfile |
Other Options
...
The item exporter can export a single item or a collection of items, and creates a DSpace simple archive according to the aforementioned format for each item to be exported. The items are exported .
...
in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item subdirectories (item_000, item_001) are not related to DSpace handle or item id's.
Command used: |
|
Java class: | org.dspace.app.itemexport.ItemExport |
Arguments short and (long) forms: | Description |
| Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.) |
| The ID or Handle of the Collection or Item to export. |
| The destination of where you want the file of items to be placed. You place the path if necessary. |
| Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export is the same as you would set your layout for an Import. |
| Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace. |
| Brief Help. |
...
Using the -m
argument will export the item/collection and also perform the migration step. It will perform the same process that the next section Transferring Items Between DSpace Instances performsor Copying Content Between Repositories performs. We recommend that the next section to be read in conjunction with this flag being used.