Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Makeup and Definition of AIPs

AIPs are Archival Information Packages

...

  • AIP is a package describing one archival objectobject in DSpace.
    • The archival object may be a single Item, Collection, Community, or Site (Site AIPs contain site-wide information). Bitstreams are included in an Item's AIP.
    • Each AIP is logically self-contained, can be restored without rest of the archive. (So you could restore a single Item, Collection or Community)
    • Collection or Community AIPs do not include all child objects (e.g. Items in those Collections or Communities), as each AIP only describes one object. However, these container AIPs do contain references (links) to all child objects. These references can be used by DSpace to automatically restore all referenced AIPs when restoring a Collection or Community.
    • AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This means that in-progress, uncompleted submissions are not described in AIPs and cannot be restored after a disaster. Permanently removed objects will also no longer be exported as AIPs after their removal. However, withdrawn objects will continue to be exported as AIPs, since they are still considered under the "in archive" status.
    • AIPs with identical contents will always have identical checksums. This provides a basic means of validating whether the contents within an AIP have changed. For example, if a Collection's AIP has the same checksum at two different points in time, it means that Collection has not changed during that time period.
    • AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object. When possible, an AIP tries to use common standards to express objects.
    • An AIP can serve as a DIP (Dissemination Information Package) or SIP (Submission Information Package), especially when transferring custody of objects to another DSpace implementation.
    • In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative metadata, and basic provenance information. AIPs also describe some basic system level information (e.g. Groups and People).

...

For more specific details of AIP format / structure, along with examples, please see DSpace AIP Format.

Running the Code

Exporting AIPs

...

To export in single AIP mode (default), use this '"packager' " command template:

Code Block
 [dspace]/bin/dspace packager -d -t AIP -e <eperson> -i <handle> <file-path>

...

It's worth understanding the primary differences between a Submission (specified by -s parameter) and a Restore (specified by -r parameter).

  • Submission Mode (-smode) - creates a new object (AIP is treated like a SIP)
    • By default, a new Handle is always assigned
      • However, you can force it to use the handle specified in the AIP by specifying -o ignoreHandle=false as one of your parameters
    • By default, a new Parent object must be specified (using the -pparameter). This is the location where the new object will be created.
      • However, you can force it to use the parent object specified in the AIP by specifying -o ignoreParent=false as one of your parameters
    • By default, will respect a Collection's Workflow process when you submit an Item to a Collection
      • However, you can specifically skip any workflow approval processes by specifying -w parameter.
    • Always adds a new Deposit License to Items
    • Always adds new DSpace System metadata to Items (includes new 'new "dc.date.accessioned'", '"dc.date.available'", '"dc.date.issued' " and '"dc.description.provenance' entries)" entries)
    • WARNING: Submission mode may not be able to maintain Item Mappings between Collections.  Because these mappings are recorded via the Collection Handles, mappings may be restored improperly if the Collection handle has changed when moving content from one DSpace instance to another.
  • Restore / Replace Mode (-rmode) - restores a previously existing object (as if from a backup)
    • By default, the Handle specified in the AIP is restored
      • However, for restores, you can force a new handle to be generated by specifying -o ignoreHandle=true as one of your parameters. (NOTE: Doesn't work for replace mode as the new object always retains the handle of the replaced object)
      • (info) Although a Restore/Replace does restore Handles, it will not necessarily restore the same internal IDs in your Database.
    • By default, the object is restored under the Parent specified in the AIP
      • However, for restores, you can force it to restore under a different parent object by using the -p parameter. (NOTE: Doesn't work for replace mode, as the new object always retains the parent of the replaced object)
    • Always skips any Collection workflow approval processes when restoring/replacing an Item in a Collection
    • Never adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP)
    • Never adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP)

...

Warning
titleMay want to skip Collection Approvals Workflows

Please note: If you are submitting a larger amount of content (e.g. multiple Communities/Collections) to your DSpace, you may want to tell the 'packager' command to skip over any existing Collection approval workflows by using the -w flag. By default, all Collection approval workflows will be respected. This means if the content you are submitting includes a Collection with an enabled workflow, you may see the following occur:

  1. First, the Collection will be created & its workflow enabled
  2. Second, each Item belonging to that Collection will be created & placed into the workflow approval process

    Therefore, if this content has already received some level of approval, you may want to submit it using the -w flag, which will skip any workflow approval processes. For more information, see Submitting AIP(s) while skipping any Collection Approval Workflows.
Warning
titleMissing Groups or EPeople cannot be created Item Mappings may not be maintained when submitting an individual Community or Collection AIP hierachy

When an Item is mapped to one or more Collections, this mapping is recorded in the AIP using the mapped Collection's handle. Unfortunately, since the submission mode (-s) assigns new handles to all objects in the hierarchy, this may mean that the mapped Collection's handle will have changed (or even that a different Collection will be available at the original mapped Collection's handle). DSpace does not have a way to uniquely identify Collections other than by handle, which means that item mappings are only able to be retained when the Collection handle is also retained.

If you encounter this issue, there are a few possible workarounds:

  1. Use the restore/replace mode (-r) instead, as it will retain existing Collection Handles. Unfortunately though, this may not work if the content is being moved from a Test DSpace to a Production DSpace, as these existing handles may not be valid.
  2. OR, use the submission mode with the "--o ignoreHandle=false". This will also retain existing Collection Handles. Unfortunately though, this may not work if the content is being moved from a Test DSpace to a Production DSpace, as these existing handles may not be valid.
  3. OR, remove all existing Item Mappings and re-export AIPs (without Item Mappings). Then, import the hierarchy into the new DSpace instance (again without Item Mappings). Finally, recreate the necessary Item Mappings using a different tool, e.g. the Batch Metadata Editing tool supports bulk editing of Collection memberships/mappings.
Warning
titleMissing Groups or EPeople cannot be created when submitting an individual Community or Collection AIP

Please note, if you are using AIPs to move Please note, if you are using AIPs to move an entire Community or Collection from one DSpace to another, there is a known issue (see DS-1105) that the new DSpace instance will be unable to (re-)create any DSpace Groups or EPeople which are referenced by a Community or Collection AIP. The reason is that the Community or Collection AIP itself doesn't contain enough information to create those Groups or EPeople (rather that info is stored in the SITE AIP, for usage during Full Site Restores).

However, there are two possible ways to get around this known issue:

  • EITHER, you can manually recreate all referenced Groups/EPeople in the new DSpace that you are submitting the Community or Collection AIP into.
    • Note that if you are using Groups named with DSpace Database IDs (e.g. COMMUNITY_1_ADMIN, COLLECTION_2_SUBMIT), you may first need to rename those groups to no longer include Database IDs (e.g. MY_SUBMITTERS). The reason is that Database IDs will likely change when you move a Community or Collection to a new DSpace installation.
  • OR, you can temporarily disable the import of Group/EPeople information when submitting the Community or Collection AIP to the new DSpace. This would mean that after you submit the AIP to the new DSpace, you'd have to manually go in and add in any special permissions (as needed). To disable the import of Group/EPeople information, add these settings to your dspace.cfgfile, and re-run the submission of the AIP with these settings in place:

    Code Block
    mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
    mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
    • Don't forget to remove these settings after you import your Community or Collection AIP. Leaving them in place will mean that every time you import an AIP, all of its Group/EPeople/Permissions would be ignored.

...

Note
titleHighly Recommended to Update Database Sequences after a Large Restore

In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is highly recommended to always re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script can should be run while the system DSpace is online (i.e. no need to stopped (you may either stop Tomcat or PostgreSQL)just the DSpace webapps). PostgreSQL/Oracle must be running. The script can be found in the following locations for PostgreSQL and Oracle, respectively:
 [dspace]/etc/postgres/update-sequences.sql
 [dspace]/etc/oracle/update-sequences.sql

...

Option

Ingest or Export

Default Value

Description

createMetadataFields=[value]

ingest-only

true

Tells the AIP ingester to automatically create any metadata fields which are found to be missing from the DSpace Metadata Registry. When 'true', this means as each AIP is ingested, new fields may be added to the DSpace Metadata Registry if they don't already exist. When 'false', an AIP ingest will fail if it encounters a metadata field that doesn't exist in the DSpace Metadata Registry. (NOTE: This will not create missing DSpace Metadata Schemas. If a schema is found to be missing, the ingest will always fail.)

filterBundlesfilterBundles=[value]

export-only

defaults to exporting all Bundles

This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any bundles not included in AIPs will obviously be unable to be restored. This option can be run in two ways:

  • Exclude Bundles: By default, you can provide a comma-separated list of bundles to be excluded from AIPs (e.g. "TEXT, THUMBNAIL")
  • Include Bundles: If you prepend the list with the "+" symbol, then the list specifies the bundles to be included in AIPs (e.g. "+ORIGINAL,LICENSE" would only include those two bundles). This second option is identical to using "includeBundles" option described below.

    (NOTE: If you choose to no longer export LICENSE or CC_LICENSE bundles, you will also need to disable the License Dissemination Crosswalks in the aip.disseminate.rightsMD configuration for the changes to take affect)

ignoreHandle=[value]

ingest-only

Restore/Replace Mode defaults to 'false',
Submit Mode defaults to 'true'

If 'true', the AIP ingester will ignore any Handle specified in the AIP itself, and instead create a new Handle during the ingest process (this is the default when running in Submit mode, using the -s flag). If 'false', the AIP ingester attempts to restore the Handles specified in the AIP (this is the default when running in Restore/replace mode, using the -r flag).

ignoreParent=[value]

ingest-only

Restore/Replace Mode defaults to 'false',
Submit Mode defaults to 'true'

If 'true', the AIP ingester will ignore any Parent object specified in the AIP itself, and instead ingest under a new Parent object (this is the default when running in Submit mode, using the -s flag). The new Parent object must be specified via the -p flag (run dspace packager -h for more help). If 'false', the AIP ingester attempts to restore the object directly under its old Parent (this is the default when running in Restore/replace mode, using the -r flag).

includeBundles=[value]

export-only

defaults to "all"

This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any bundles not included in AIPs will obviously be unable to be restored. This option expects a comma separated list of bundle names (e.g. "ORIGINAL,LICENSE,CC_LICENSE,METADATA"), or "all" if all bundles should be included.

(See "filterBundles" option above if you wish to exclude particular Bundles. However, this "includeBundles" option cannot be used at the same time as "filterBundles".)

(NOTE: If you choose to no longer export LICENSE or CC_LICENSE bundles, you will also need to disable the License Dissemination Crosswalks in the aip.disseminate.rightsMD configuration for the changes to take affect)

manifestOnly=[value]

both import and export

false

If 'true', the AIP Disseminator will export an AIP which only consists of the only import/export a METS Manifest XML file (i.e. result will be a single an unzipped 'mets.xml' file), instead of a full AIP. This METS Manifest contains URI references to all content files, but does not contain any content files. This option is experimental , and and is meant for debugging purposes only. It should never be set to 'true' if you want to be able to restore content files. Again, please note that when you use this option, the final result will be an XML file, NOT the normal ZIP-based AIP format.

passwords=[value]

export-only

false

If 'true' (and the 'DSPACE-ROLES' crosswalk is enabled, see #AIP Metadata Dissemination Configurations), then the AIP Disseminator will export user password hashes (i.e. encrypted passwords) into Site AIP's METS Manifest. This would allow you to restore user's passwords from Site AIP. If 'false', then user password hashes are not stored in Site AIP, and passwords cannot be restored at a later time.

skipIfParentMissing=[value]

import-only

false

If 'true', ingestion will skip over any "Could not find a parent DSpaceObject" errors that are encountered during the ingestion process (Note: those errors will still be logged as "warning" messages in your DSpace log file). If you are performing a full site restore (or a restore of a larger Community/Collection hierarchy), you may encounter these errors if you have a larger number of Item mappings between Collections (i.e. Items which are mapped into several collections at once). When you are performing a recursive ingest, skipping these errors should not cause any problems. Once the missing parent object is ingested it will automatically restore the Item mapping that caused the error. For more information on this "Could not find a parent DSpaceObject" error see Common Issues or Error Messages.

unauthorized=[value]

export-only

unspecified

If 'skip', the AIP Disseminator will skip over any unauthorized Bundle or Bitstream encountered (i.e. it will not be added to the AIP). If 'zero', the AIP Disseminator will add a Zero-length "placeholder" file to the AIP when it encounters an unauthorized Bitstream. If unspecified (the default value), the AIP Disseminator will throw an error if an unauthorized Bundle or Bitstream is encountered.

updatedAfter=[value]

export-only

unspecified

This option works as a basic form of "incremental backup". This option requires that an ISO-8601 date is specified. When specified, the AIP Disseminator will only export Item AIPs which have a last-modified date after the specified ISO-8601 date. This option has no affect on the export of Site, Community or Collection AIPs as DSpace does not record a last-modified date for Sites, Communities or Collections. For example, when this option is specified during a full-site export, the AIP Disseminator will export the Site AIP, all Community AIPs, all Collection AIPs, and only Item AIPs modified after that date and time.

validate=[value]

both import and export

Export defaults to 'true',
Ingest defaults to 'false'

If 'true', every METS file in AIP will be validated before ingesting or exporting. By default, DSpace will validate everything on export, but will skip validation during import. Validation on export will ensure that all exported AIPs properly conform to the METS profile (and will throw errors if any do not). Validation on import will ensure every METS file in every AIP is first validated before importing into DSpace (this will cause the ingestion processing to take longer, but tips on speeding it up can be found in the "AIP Configurations To Improve Ingestion Speed while Validating" section below). DSpace recommends minimally validating AIPs on export. Ideally, you should validate both on export and import, but import validation is disabled by default in order to increase the speed of AIP restores.

...

Code Block
 [dspace]/bin/dspace packager -r -a -t AIP -o [option1-]=[value] -o [option2-]=[value] -e admin@myu.edu aip4567.zip

...