Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
titleNote

The project

...

home for this project is: https://github.com/DSpace-Labs/SAFBuilder


Note
titleSAF Packager is

...

becoming outdated, other projects exist

After the core developer left, the SAF Packager / SAFBuilder has not been maintained in recent years. But a number of newer, similar programs exist from other community members:

...

...


The input for a command-line batch ingest of materials to DSpace is well documented, and is called "Simple Archive Format", however there don't seem needs to be tools available a tool that easily facilitate facilitates creating a Simple Archive Format package. The approach that use case satisfied with the Simple Archive Format Packager is taking is that someone is tracking all of the items of future collection in a folder, and keeping metadata about it in a spreadsheethas a spreadsheet filled with metadata as well as content files that are eventually destined for repository ingest.

Thus the input to the Simple Archive Format Packager is a spreadsheet /CSV (.csv) that has a column filename for the bitstream/file, and other columns that will have fully qualified metadata attributes, such as the following columns:

  • filename of the content file(s)
  • namespace.element.qualifier metadata for the item. Examples would be: dc.description or dc.contributor.author

...

Image Removed

.h2 To get started with the code in the sandbox repo.
#Check out directory from the svn repository.
#In an IDE, (tested in NetBeans), Create a Java Application with Existing Sources.
#Add the source directory 'src'
#Download and add the third party libraries (.jars) as mentioned in the README.

You will then need to edit BatchProcess.java so that inputDir, and metaFile match the path to the sample_data, or whatever collection data you are throwing at it.

In the source tree there is a sample_data directory to help kick start testing and development of this tool.

The expected output of this tool is going to be something that satisfies the specification laid out by "Simple Archive Format".
Image Removed

...

Further, dates need to be in ISO-8601 format in order to be properly recognized. And for any column that has multiple values, you can separate each entry with a double-pipe "||".  For example, for multiple files just set "filename" to "file1.pdf||file2.pdf||file3.pdf".  Similarly, multiple "dc.subject" values can be separated by "||" as shown in the below example.

Image Added

While you are preparing the batch load, you have a directory containing a spreadsheet filled with metadata and content files.

Image Added

Obtaining, Compiling, and Running SAFBuilder

The SAFBuilder project resides on GitHub. Please refer to the project instructions for how to install and run it. Its requires Java JDK 7+, and runs from the terminal / command prompt.


The ./safbuilder.sh command with no arguments will show the help screen.

Panel

Recompiling SAFBuilder, just a moment...

usage: SAFBuilder

 -c,--csv <arg>   Filename with path of the CSV spreadsheet. This must be in the same directory as the content files

 -h,--help        Display the Help

 -m,--manifest    Initialize a spreadsheet, a manifest listing all of the files in the directory, you must specify a CSV for -c

 -z,--zip         (optional) ZIP the output

There is sample data included with the tool to give an idea of how to use this.

To run the tool over the sample data:

Code Block
./safbuilder.sh -c /home/dspace/SAFBuilder/src/sample_data/AAA_batch-metadata.csv

This creates the SimpleArchiveFormat directory inside of the directory specified, along with subdirectories, content files, metadata files that is ready to import into DSpace.

Image Added
This is then immediately ready to be batch imported into DSpace. If you created a ZIP file of this, that can be imported to DSpace using Batch Import UI. An example of DSpace command line import is.

Code Block
sudo /dspace/bin/dspace import -a 
    -e peter@longsight.com 
    -c 1811/49710 
    -s /home/dspace/SAFBuilder/src/sample_data/SimpleArchiveFormat/
    -m /home/dspace/SAFBuilder/src/sample_data/batch1.map

Further Work

This packager works as a stand-alone tool, and requires knowledge of Java to be able to run. Thus satisfying the initial need to be able to package many items to be batch loaded into DSpace, using DSpace's launcher item-import. So the remaining goal of this project is to streamline the process of batch loading materials into DSpace.

Possibilities include:

  • refactoring so that it can become a Packager Plugin. Packager plugins allow you to implement a way for DSpace to accept an input package (containing content files, manifest, and metadata) that then creates DSpace items.
  • creating a client GUI for the desktop.
  • Dedicated web service

...