Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.
To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in dspace.cfg. The configuration file dspace.cfg establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in the DSpace host's file system or a set of SRB account parameters. This asset store number is described in The dspace.cfg Configuration Properties File section and in the dspace.cfg file itself. The asset store number(s) used for registered items should generally not be the value of the assetstore.incoming property since it is unlikely that you will want to mix the bitstreams of normally ingested and imported items and registered items.
DSpace uses the same import tool that is used for batch import except that several variations are employed to support registration. The discussion that follows assumes familiarity with the import tool.
The DSpace Simple Archive Format for registration does not include the actual content files (bitstreams) being registered. The format is however a directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (dublin_core.xml) and a file listing the item's content files (contents), but not the actual content files themselves.
The dublin_core.xml file for item registration is exactly the same as for regular item import.
The contents file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats:
-r -s n -f filepath -r -s n -f filepath\tbundle:bundlename -r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name' -r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text
-rindicates this is a file to be registered
-s nindicates the asset store number (n)
-f filepathindicates the path and name of the content file to be registered (filepath)
\tis a tab character
bundle:bundlenameis an optional bundle name
permissions: -[r|w] 'group name'is an optional read or write permission that can be attached to the bitstream
description: some textis an optional description field to add to the file
The command line for registration is just like the one for regular import:
[dspace]/bin/dspace import -a -e firstname.lastname@example.org -c collectionID -s items_dir -m mapfile
(or by using the long form)
[dspace]/bin/dspace import --add --email@example.com --collection=collectionID --source=items_dir --map=mapfile
--test flags will function as described in Importing Items.
--delete flag will function as described in Importing Items but the registered content files will not be removed from storage. See Deleting Registered Items.
--replace flag will function as described in Importing Items but care should be taken to consider different cases and implications. With old items and new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace using
--replace will not be removed from the storage. See Deleting Registered Items. where is resides. A new item added to DSpace using
--replace will be ingested normally or will be registered depending on whether or not it is marked in the contents files with the
Once an item has been registered, superficially it is indistinguishable from items ingested interactively or by batch import. But internally there are some differences:
First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and name are that specified in the contents file.
Second, the store_number column of the bitstream database row contains the asset store number specified in the contents file.
Third, the internal_id column of the bitstream database row contains a leading flag (
-R) followed by the registered file path and name. For example,
-Rfilepath where filepath is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be traditional storage in the DSpace server's file system or an SRB account.
Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage. If the registered file is in remote storage (say, SRB) a checksum is calculated on just the file name! This is an efficiency choice since registering a large number of large files that are in SRB would consume substantial network resources and time. A future option could be to have an SRB proxy process calculate MD5s and store them in SRB's metadata catalog (MCAT) for rapid retrieval. SRB offers such an option but it's not yet in production release.
Registered items and their bitstreams can be retrieved transparently just like normally ingested items.
Registered items may be exported as described in Exporting Items. If so, the export directory will contain actual copies of the files being exported but the lines in the contents file will flag the files as registered. This means that if DSpace items are "round tripped" (see Transferring Items Between DSpace Instances) using the exporter and importer, the registered files in the export directory will again registered in DSpace instead of being uploaded and ingested normally.
If a registered item is deleted from DSpace, (either interactively or by using the
--replace flags described in Importing and Exporting Items via Simple Archive Format) the item will disappear from DSpace but its registered content files will remain in place just as they were prior to registration. Bitstreams not registered but added by DSpace as part of registration, such as
license.txt files, will be deleted.