Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: organization of headings

Table of Contents
minLevel2
outlinetrue
stylenone

...

Introduction

DSpace provides a batch metadata editing tool. The batch editing tool is able to produce a comma delimited file in the CSV format. The batch editing tool facilitates the user to perform the following:

...

For information about configuration options for the Batch Metadata Editing tool, see Batch Metadata Editing Configuration

Export Function

The following table summarizes the basics.

Command used:

[dspace]/bin/dspace metadata-export

Java class:

org.dspace.app.bulkedit.MetadataExport

Arguments short and (long) forms):

Description

-f or --file

Required. The filename of the resulting CSV.

-i or --id

The Item, Collection, or Community handle or Database ID to export. If not specified, all items will be exported.

-a or --all

Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the [dspace]/config/modules/bulkedit.cfg to be ignored on export.

-h or --help

Display the help page.

Exporting Process

To run the batch editing exporter, at the command line:

...

In the above example we have requested that a collection, assigned handle '1989.1/24' export the entire collection to the file 'col_14.csv' found in the '/batch_export' directory.

Import Function

The following table summarizes the basics.

...

Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database.

Importing Process

To run the batch importer, at the command line:

...

Info
titleImporting large CSV files

It is not recommended to import CSV files of more than 1,000 lines.  When importing files larger than this, it is hard to accurately verify the changes that the import tool states it will make, and large files may cause 'Out Of Memory' errors part way through the process.

The CSV Files

The csv files that this tool can import and export abide by the RFC4180 CSV format. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.

...

When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the csv file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).

Editing Collection Membership

Items can be moved between collections by editing the collection handles in the 'collection' column. Multiple collections can be included. The first collection is the 'owning collection'. The owning collection is the primary collection that the item appears in. Subsequent collections (separated by the field separator) are treated as mapped collections. These are the same as using the map item functionality in the DSpace user interface. To move items between collections, or to edit which other collections they are mapped to, change the data in the collection column.

Adding Metadata-Only Items

New metadata-only items can be added to DSpace using the batch metadata importer. To do this, enter a plus sign '+' in the first 'id' column. The importer will then treat this as a new item. If you are using the command line importer, you will need to use the -e flag to specify the user email address or id of the user that is registered as submitting the items.

Deleting Metadata

It is possible to perform metadata deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords (dc.subject) that need to be removed en masse. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows.

Performing 'actions' on items

It is possible to perform certain 'actions' on items.  This is achieved by adding an 'action' column to the CSV file (after the id, and collection columns).  There are three possible actions:

...

If an action makes no change (for example, asking to withdraw an item that is already withdrawn) then, just like metadata that has not changed, this will be ignored.

Migrating Data or Exchanging data.

It is possible that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import:

...