The repository home for this project is:http https://scmgithub.dspace.orgcom/svn/repo/sandbox/SimpleArchiveFormat_Builder-prototype/peterdietz/SAFBuilder
The input for a command-line batch ingest of materials to DSpace is well documented, and is called "Simple Archive Format", however there don't seem needs to be tools available a tool that easily facilitate facilitates creating a Simple Archive Format package. The approach that use case satisfied with the Simple Archive Format Packager is taking is that someone is tracking all of the items of future collection in a folder, and keeping metadata about it in a spreadsheethas a spreadsheet filled with metadata as well as content files that are eventually destined for repository ingest.
Thus the input to the Simple Archive Format Packager is a spreadsheet /CSV (.csv) that has a column the following columns:
- filename for the bitstream/file
...
- metadata with namespace.element.(qualifer). Examples would be: dc.description or dc.contributor.author
...
.h2 To get started with the code in the sandbox repo.
#Check out directory from the svn repository.
#In an IDE, (tested in NetBeans), Create a Java Application with Existing Sources.
#Add the source directory 'src'
#Download and add the third party libraries (.jars) as mentioned in the README .
You will then need to edit BatchProcess.java so that inputDir, and metaFile match the path to the sample_data, or whatever collection data you are throwing at it.
In the source tree there is a sample_data directory to help kick start testing and development of this tool.
The expected output of this tool is going to be something that satisfies the specification laid out by "Simple Archive Format".
...
Java Compiling and Running Instructions
The commands below will: check out the code from Git, download the external java libraries used to make the tool, compile the source code, and execute it.
Code Block |
---|
git clone git://github.com/peterdietz/SAFBuilder.git
cd SAFBuilder
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/net/sourceforge/javacsv/javacsv/2.0/javacsv-2.0.jar
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/xmlwriter/xmlwriter/2.2/xmlwriter-2.2.jar
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/commons-io/commons-io/1.4/commons-io-1.4.jar
mkdir classes
javac -classpath javacsv-2.0.jar:commons-io-1.4.jar:xmlwriter-2.2.jar src/edu/osu/kb/batch/*.java -d classes
java -cp classes edu.osu.kb.batch.BatchProcess
|
The final command will then give you the arguments used to invoke the program.
Panel |
---|
USAGE: BatchProcess /path/to/directory metadatafilename.csv |
There is sample data included with the tool to give an idea of how to use this.
To run the tool over the sample data:
Code Block |
---|
java -cp classes:javacsv-2.0.jar:commons-io-1.4.jar:xmlwriter-2.2.jar edu.osu.kb.batch.BatchProcess /home/peter/NetBeansProjects/SAFBuilder/src/edu/osu/kb/sample_data AAA_batch-metadata.csv
|
This creates the SimpleArchiveFormat directory inside of the directory specified, along with subdirectories, content files, metadata files that is ready to import into DSpace.
Further Work
This packager works as a stand-alone tool, and requires knowledge of Java to be able to run. Thus satisfying the initial need to be able to package many items to be batch loaded into DSpace, using DSpace's launcher item-import. So the remaining goal of this project is to streamline the process of batch loading materials into DSpace.
Possibilities include:
- refactoring so that it can become a Packager Plugin. Packager plugins allow you to implement a way for DSpace to accept an input package (containing content files, manifest, and metadata) that then creates DSpace items.
- creating a client GUI for the desktop.