Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Login as an Administrative user
  • Browse to the Community or Collection you wish to export to CSV
    • NOTE: in the JSPUI, it is also possible to export search results to CSV. Just perform a search, and click on the "Export Metadata" button above the search results.
  • Click "Export Metadata" link to export to a downloadable CSV
    • In XMLUI, "Export Metadata" can be found in the "Context" menu on a Community/Collection homepage
    • In JSPUI, "Export Metadata" can be found in the "Admin Tools" menu on a Community/Collection homepage

...

Please see below documentation for more information on the CSV format and actions that can be performed by editing the CSV .

Import Function

Note
titleImporting large CSV files

It is not recommended to import CSV files of more than 1,000 lines (i.e. 1,000 items). When importing files larger than this, it may be difficult for an Administrator to accurately verify the changes that the import tool states it will make. In addition, depending on the memory available to the DSpace site, large files may cause 'Out Of Memory' errors part way through the import process.

Web Interface Import

Batch metadata imports (from CSV) can be performed from the Administrative menu:

...

Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database.

...

To run the batch importer, at the command line:

...

In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added.

...

titleImporting large CSV files

...

.

CSV Format

The csv CSV (comma separated values) files that this tool can import and export abide by the RFC4180 CSV format. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.

All CSV files are also in UTF-8 encoding in order to support all languages.

File Structure

The first row of the csv CSV must define the metadata values that the rest of the csv CSV represents. The first column must always be "id" which refers to the item's idinternal database ID. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside.

...

If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in your modules/bulkedit.cfg file). For example:

Code Block
Horses||Dogs||Cats

Elements are stored in the database in the order that they appear in the csv CSV file. You can use this to order elements where order may matter, such as authors, or controlled vocabulary such as Library of Congress Subject Headings.

When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the csv file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).

Editing the CSV

Note
titleIf you are editing with Microsoft Excel, be sure to open the CSV in Unicode/UTF-8 encoding

By default, Microsoft Excel may not correctly open a the CSV in Unicode/UTF-8 encoding. This means that special characters may be improperly displayed and also can be "corrupted" during re-import of the CSV.

You need to tell Excel this CSV is Unicode, by importing it as follows. (Please note these instructions are valid for MS Office 2007 and 2013. Other Office versions may vary)

  • Open First, open Excel (with an empty sheet/workbook open)
  • Select "Data" tab
  • Click "From Text" button (in the "External Data" section)
  • Select your CSV file
  • Wizard Step 1
    • Choose "Delimited" option
    • Start import at row: 1
    • In the "File origin" selectbox, select "65001 : Unicode (UTF-8)"
      • NOTE: these encoding options are sorted alphabetically, so "Unicode (UTF-8)" appears near the bottom of the list.
    • Click Next
  • Wizard Step 2
    • Select "Comma" as the only delimiter
    • Click Next
  • Wizard Step 3
    • Select "Text" as the "Column data format" (Unfortunately, this must be done for each column individually in Excel)
      • At a minimum, you MUST ensure all date columns (e.g. dc.date.issued) are treated as "Text" so that Excel doesn't autoconvert DSpace's YYYY-MM-DD format into MM/DD/YYYY
      • To avoid such autoconversion, it is safest to ensure each column is treated as "Text".  Unfortunately, this means selecting each column one-by-one and choosing "Text" as the "Column data format".
    • Click Finish
  • Choose whether to open CSV in the existing sheet or a new one


Info
titleTips to Simplify the Editing Process

 When editing a CSV, here's a couple of basic tips to keep in mind:

  1. The "id" column MUST remain intact. This column also must always have a value in it.
  2. To simplify the CSV, you can simply remove any columns you do NOT wish to edit (except for "id" column, see #1). Don't worry, removing the entire column won't delete metadata (see #3)
  3. When importing a CSV file, the importer will overlay the metadata onto what is already in the repository to determine the differences. It only acts on the contents of the CSV file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. 
    1. For example, if you only want to edit "dc.subject", you can remove ALL columns EXCEPT for "id" and "dc.subject" so that you can just manipulate the "dc.subject" field. On import, DSpace will see that you've only included the "dc.subject" field in your CSV and therefore will only update the "dc.subject" metadata field for any items listed in that CSV.
  4. Because removing an entire column does NOT delete metadata value(s), if you actually wish to delete a metadata value you should leave the column intact, and simply clear out the appropriate row's value (in that column).

 

Editing Collection Membership

...

If an action makes no change (for example, asking to withdraw an item that is already withdrawn) then, just like metadata that has not changed, this will be ignored.

Migrating Data or Exchanging data

...

It is possible that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is migrated upon import:

  1. Insert a new column. The first row should be the new metadata element. (We will refer to it as the TARGET)
  2. Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)
  3. Cut and paste this data into the new column (TARGET) you created in Step 1.
  4. Leave the column (SOURCE) you just cut and pasted from empty. Do not delete it.

Common Issues

Metadata values in CSV export seem to have duplicate columns
DSpace responds with "No changes were detected" when CSV is uploaded

Unfortunately, this response may be caused in many ways

  • It's possible the CSV was not saved properly after editing. Check that the edits are in the CSV, and that there were no backend errors in the DSpace logs (which would be an indication of an invalid or corrupted CSV file)
  • Depending on the version of DSpace, you may be encountering this known bug with processing linebreaks in CSV files: DS-3245
  • If you are setting a new embargo date in the CSV, ensure that the embargo lift date is a future date.  It's been reported that past dates may cause DSpace to ignore item changes.