Page History
...
However, the new AIP Backup & Restore option seeks to try and resolve many of the complexities of a traditional backup and restore. The below table details some of the differences between these two valid Backup and Restore options.
Traditional Backup & Restore (Database and Files) | AIP Backup & Restore |
---|---|
Supported Backup/Restore Types |
Can Backup & Restore all DSpace Content easily | Yes (Requires two backups/restores – one for Database and one for Files) | Yes (Though, will not backup/restore items which are not officially "in archive") |
Can Backup & Restore a Single Community/Collection/Item easily | No (It is possible, but requires a strong understanding of DSpace database structure & folder organization in order to only backup & restore metadata/files belonging to that single object) | Yes |
Backups can be used to move one or more Community/Collection/Items to another DSpace system easily. | No (Again, it is possible, but requires a strong understanding of DSpace database structure & folder organization in order to only move metadata/files belonging to that object) | Yes |
Can Backup & Restore Item Versions | Yes (Requires two backups/restores – one for Database and one for Files) | No (Currently Item Level Versioning is not fully compatible with AIP Backup & Restore. AIP Backup & Restore can only backup/restore the latest version of an Item) |
Supported Object Types During Backup & Restore |
Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.) | Yes | Yes |
Supports backup/restore of all People/Groups/Permissions | Yes | Yes |
Supports backup/restore of all Collection-specific Item Templates | Yes | Yes |
Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE) | Yes | No (This is a known issue. All previously harvested Items will be restored, but the OAI-PMH/OAI-ORE harvesting settings will be lost during the restore process.) |
Supports backup/restore of all Withdrawn (but not deleted) Items | Yes | Yes |
Supports backup/restore of Item Mappings between Collections | Yes | Yes (During restore, the AIP Ingester may throw a false "Could not find a parent DSpaceObject" error (see |
81953672), if it tries to restore an Item Mapping to a Collection that it hasn't yet restored. But this error can be safely bypassed using the 'skipIfParentMissing' flag (see |
81953672 for more details). | ||
Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow) | Yes | No (AIPs are only generated for objects which are completed and considered "in archive") |
Supports backup/restore of Items using custom Metadata Schemas & Fields | Yes | Yes (Custom Metadata Fields will be automatically recreated. Custom Metadata Schemas must be manually created first, in order for DSpace to be able to recreate custom fields belonging to that schema. See |
81953672 for more details.) | ||
Supports backup/restore of all local DSpace Configurations and Customizations | Yes (if you backup your entire DSpace directory as part of backing up your files) | Not by default (unless you also backup parts of your DSpace directory – note, you wouldn't need to backup the '[dspace]/assetstore' folder again, as those files are already included in AIPs) |
Based on your local institutions needs, you will want to choose the backup & restore process which is most appropriate to you. You may also find it beneficial to use both types of backups on different time schedules, in order to keep to a minimum the likelihood of losing your DSpace installation settings or its contents. For example, you may choose to perform a Traditional Backup once per week (to backup your local system configurations and customizations) and an AIP Backup on a daily basis. Alternatively, you may choose to perform daily Traditional Backups and only use the AIP Backup as a "permanent archives" option (perhaps performed on a weekly or monthly basis).
...
There are two types of AIP Dissemination you can perform:
- Single AIP (default, using
-d
option) - Exports just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection, you'd just end up with a single Collection AIP (which would not include AIPs for all its child Items) - Hierarchy of AIPs (using the "
-d --all"
or "-d -a"
option) - Exports the requested AIP describing an object, plus the AIP for all child objects. Some examples follow:- For a Site - this would export all Communities, Collections & Items within the site into AIP files (in a provided directory)
- For a Community - this would export that Community and all SubCommunities, Collections and Items into AIP files (in a provided directory)
- For a Collection - this would export that Collection and all contained Items into AIP files (in a provided directory)
- For an Item – this just exports the Item into an AIP as normal (as it already contains its Bitstreams/Bundles by default)
...
Again, this would export the DSpace Site AIP into the file "sitewide-aip.zip", and export AIPs for all Communities, Collections and Items into the same directory as the Site AIP.
The XML file in sitewide-aip.zip contains information about repository users, groups and top level communities.
Ingesting / Restoring AIPs
...
Ingestion of AIPs is a bit more complex than Dissemination, as there are several different "modes" available:
- Submit/Ingest Mode (
-s
option, default) – submit AIP(s) to DSpace in order to create a new object(s) (i.e. AIP is treated like a SIP – Submission Information Package) - Restore Mode (
-r
option) – restore pre-existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships (parent/child objects). This is a specialized type of "submit", where the object is created with a known Handle and known relationships. - Replace Mode (
-r -f
option) – replace existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships (parent/child objects). This is a specialized type of "restore" where the contents of existing object(s) is replaced by the contents in the AIP(s). By default, if a normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which object already exists.
...
It's worth understanding the primary differences between a Submission (specified by -s
parameter) and a Restore (specified by -r
parameter).
- Submission Mode (
-s
mode) - creates a new object (AIP is treated like a SIP)- By default, a new Handle is always assigned
- However, you can force it to use the handle specified in the AIP by specifying
-o ignoreHandle=false
as one of your parameters
- However, you can force it to use the handle specified in the AIP by specifying
- By default, a new Parent object must be specified (using the
-p
parameter). This is the location where the new object will be created.- However, you can force it to use the parent object specified in the AIP by specifying
-o ignoreParent=false
as one of your parameters
- However, you can force it to use the parent object specified in the AIP by specifying
- By default, will respect a Collection's Workflow process when you submit an Item to a Collection
- However, you can specifically skip any workflow approval processes by specifying
-w
parameter.
- However, you can specifically skip any workflow approval processes by specifying
- Always adds a new Deposit License to Items
- Always adds new DSpace System metadata to Items (includes new "dc.date.accessioned", "dc.date.available", "dc.date.issued" and "dc.description.provenance" entries)
- WARNING: Submission mode may not be able to maintain Item Mappings between Collections. Because these mappings are recorded via the Collection Handles, mappings may be restored improperly if the Collection handle has changed when moving content from one DSpace instance to another.
- By default, a new Handle is always assigned
- Restore / Replace Mode (
-r
mode) - restores a previously existing object (as if from a backup)- By default, the Handle specified in the AIP is restored
- However, for restores, you can force a new handle to be generated by specifying
-o ignoreHandle=true
as one of your parameters. (NOTE: Doesn't work for replace mode as the new object always retains the handle of the replaced object) - Although a Restore/Replace does restore Handles, it will not necessarily restore the same internal IDs in your Database.
- However, for restores, you can force a new handle to be generated by specifying
- By default, the object is restored under the Parent specified in the AIP
- However, for restores, you can force it to restore under a different parent object by using the
-p
parameter. (NOTE: Doesn't work for replace mode, as the new object always retains the parent of the replaced object)
- However, for restores, you can force it to restore under a different parent object by using the
- Always skips any Collection workflow approval processes when restoring/replacing an Item in a Collection
- Never adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP)
- Never adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP)
- By default, the Handle specified in the AIP is restored
Note | ||
---|---|---|
| ||
It is possible to change some of the default behaviors of both the Submission and Restore/Replace Modes. Please see the Additional Packager Options 81953672 section below for a listing of command-line options that allow you to override some of the default settings described above. |
...
The Submission mode (-s
) always creates a new object with a newly assigned handle. In addition by default it respects all existing Collection approval workflows (so items may require approval unless the workflow is skipped by using the -w
option). For information about how the "Submission Mode" differs from the "Replace / Restore mode", see The difference between "Submit" and "Restore/Replace" modes above.
Submitting a Single AIP
...
Warning | ||
---|---|---|
| ||
Please note: If you are submitting a larger amount of content (e.g. multiple Communities/Collections) to your DSpace, you may want to tell the 'packager' command to skip over any existing Collection approval workflows by using the
|
Warning | ||
---|---|---|
| ||
When an Item is mapped to one or more Collections, this mapping is recorded in the AIP using the mapped Collection's handle. Unfortunately, since the submission mode (-s) assigns new handles to all objects in the hierarchy, this may mean that the mapped Collection's handle will have changed (or even that a different Collection will be available at the original mapped Collection's handle). DSpace does not have a way to uniquely identify Collections other than by handle, which means that item mappings are only able to be retained when the Collection handle is also retained.
|
...
Warning | ||
---|---|---|
| ||
Please note, if you are using AIPs to move an entire Community or Collection from one DSpace to another, there is a known issue (see DS-1105) that the new DSpace instance will be unable to (re-)create any DSpace Groups or EPeople which are referenced by a Community or Collection AIP. The reason is that the Community or Collection AIP itself doesn't contain enough information to create those Groups or EPeople (rather that info is stored in the SITE AIP, for usage during Full Site Restores).
|
...
This -w
flag may also be used when Submitting an AIP Hierarchy 81953672. For example, if you are migrating one or more Collections/Communities from one DSpace to another, you may choose to submit those AIPs with the -w
option enabled. This will ensure that, if a Collection has a workflow approval process enabled, all its Items are available immediately rather than being all placed into the workflow approval process.
...
Restoring is slightly different than just submitting. When restoring, we make every attempt to restore the object as it used to be (including its handle, parent object, etc.). For more information about how the "Replace/Restore Mode" differs from the "Submit mode", see The difference between "Submit" and "Restore/Replace" modes above.
There are currently three restore modes:
- Default Restore Mode 81953672 (
-r
) = Attempt to restore object (and optionally children). Rollback all changes if any object is found to already exist. - Restore, Keep Existing Mode 81953672 (
-r -k
) = Attempt to restore object (and optionally children). If an object is found to already exist, skip over it (and all children objects), and continue to restore all other non-existing objects. - Force Replace Mode 81953672 (
-r -f
) = Restore an object (and optionally children) and overwrite any existing objects in DSpace. Therefore, if an object is found to already exist in DSpace, its contents are replaced by the contents of the AIP. WARNING: This mode is potentially dangerous as it will permanently destroy any object contents that do not currently exist in the AIP. You may want to perform a secondary backup, unless you are sure you know what you are doing!
...
Info | ||
---|---|---|
| ||
|
...
- Install a completely "fresh" version of DSpace by following the Installation instructions in the DSpace Manual
- At this point, you should have a completely empty, but fully-functional DSpace installation. You will need to create an initial Administrator user in order to perform this restore (as a full-restore can only be performed by a DSpace Administrator).
Once DSpace is installed, run the following command to restore all its contents from AIPs
Code Block [dspace]/bin/dspace packager -r -a -f -t AIP -e <eperson> -i <site-handle-prefix>/0 -o skipIfParentMissing=true /full/path/to/your/site-aip.zip
- While the "
-o skipIfParentMissing=true
" flag is optional, it is often necessary whenever you are performing a large hierarchical site restoration. Please see the Additional Packager Options 81953672 section below.
- While the "
Please note the following about the above restore command:
- Notice that you are running this command in "Force Replace" mode (
-r -f
). This is necessary as your empty DSpace install will already include a few default groups (Administrators and Anonymous) and your initial administrative user. You need to replace these groups in order to restore your prior DSpace contents completely. <eperson>
should be replaced with the Email Address of the initial Administrator (who you created when you reinstalled DSpace).<site-handle-prefix>
should be replaced with your DSpace site's assigned Handle Prefix. This is equivalent to thehandle.prefix
setting in yourdspace.cfg
/full/path/to/your/site-aip.zip
is the full path to the AIP file which represents your DSpace SITE. This file will be named whatever you named it when you actually exported your entire site. All other AIPs are assumed to be referenced from this SITE AIP (in most cases, they should be in the same directory as that SITE AIP).
Note | ||
---|---|---|
| ||
In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is highly recommended to always re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script should be run while DSpace is stopped (you may either stop Tomcat or just the DSpace webapps). PostgreSQL/Oracle must be running. The script can be found in the following locations for PostgreSQL and Oracle, respectively: |
Performance considerations
Cleaning up from a failed import
Sometimes your packager import of AIP packages can fail, due to lack of memory (see below for advice on better performance, please use JAVA_OPTS to set your memory higher than the default). If that happens, DSpace by design will leave the bitstreams it did import sucessfully, but they will be oprphaned, and will just occupy space in your assetstore. The standard DSpace cleanup cron job will clean up these orphaned bitstreams, however, you can also clean them up manually by running the following command:
Code Block | ||||
---|---|---|---|---|
| ||||
[dspace]/bin/dspace cleanup -v |
Performance considerations
When importing large structures like When importing large structures like the whole site or a large collection/community, keep in mind that this can require a lot of memory, more than the default amount of heap allocated to the command-line launcher (256 Mb: JAVA_OPTS="-Xmx256m -Dfile.encoding=UTF-8"
). This memory must be allocated in addition to the normal amount of memory allocated to Tomcat. For example, a site of 2500 fulltext items (2 Gb altogether) requires 5 Gb of maximum heap space and takes around 1 hour, including import and indexing.
...
If you wish to run any of the following commands from a cron job (or similar), then you may wish to disable all user interaction using the -u
(--no-user-interaction
) flag. For example, supposing you wanted to perform a full Site Backup (see Exporting Entire Site 81953672 above) via a cronjob, you could simply run that command passing it the "-u" flage like this:
Code Block |
---|
# Perform a full site backup to AIPs(with user interaction disabled) every Sunday at 1:00AM # NOTE: Make sure to replace "123456789" with your actual Handle Prefix, and "admin@myu.edu" with your Administrator account email. 0 1 * * * [dspace]/bin/dspace packager -u -d -a -t AIP -e admin@myu.edu -i 123456789/0 [full-path-to-backup-folder]/sitewide-aip.zip |
...
Command Line Reference
The following flags are valid to pass to the [dspace]/bin/dspace packager
command:
Flag | Ingest or Export | Description / Usage |
---|---|---|
-a (--all) | both ingest and export | For Ingest: recursively ingest all child AIPs (referenced from this AIP). For Export: recursively export all child objects (referenced from this parent object) |
-d (--disseminate) | export-only | This flag simply triggers the export of AIPs from the system. See |
81953672 | ||
-e (–eperson) [email-address] | ingest-only | The email address of the EPerson who is ingesting the AIPs. Oftentimes this should be an Administrative account. |
-f (--force-replace) | ingest-only | Ingest the AIPs in " |
81953672" (must be specified in conjunction with -r flag), where existing objects will be replaced by the contents of the AIP. | ||
-h (--help) | both ingest and export | Return help information. You should specify with -t for additional package specific help information |
-i (--identifier) [handle] | both ingest and export | For Ingest: Only valid in " |
81953672". In that mode this is the identifier of the object to replace. For Export: The identifier of the object to export to an AIP | ||
-k (--keep-existing) | ingest-only | Specifies to use " |
81953672" during ingest (must be specified in conjunction with -r flag). In this mode, existing objects in DSpace will NOT be replaced by their AIPs, but missing objects will be restored from AIPs. | ||
-o (--option) [setting]=[value] | both ingest and export | This flag is used to pass |
81953672 to the Packager command. Each type of packager may define its own custom Additional Options. For AIPs, the valid options are documented in the |
81953672 section below. This is repeatable (e.g. -o [setting1]=[value] -o [setting2]=value ) | ||
-p (--parent) [handle] | ingest only | Handle(s) of the parent Community or Collection to into which an AIP should be ingested. This may be repeatable. |
-r (--restore) | ingest only | Specifies that this ingest is either "Restore Mode" (when standalone), " |
81953672" (when used with -k flag) or " |
81953672" (when used with -f flag) | ||
-s (--submit) | ingest only | Specifies that this ingest is in "Submit Mode" where an AIP is treated as a new object and assigned a new Handle/Identifier, etc. |
-t (--type) [package-type] | both ingest and export | Specifies the type of package which is being ingested or exported. This controls which Ingester or Disseminator class is called. For AIPs, this is always set to "-t AIP " |
-u (--no-user-interaction) | both ingest and export | Skips over all user interaction (e.g. question prompts). This flag can be used when running the packager from a script or cron job to bypass all user interaction. See also |
Additional Packager Options
In additional to the various "modes" settings described under "Running the Code81953672" above, the AIP Packager supports the following packager options. These options allow you to better tweak how your AIPs are processed (especially during ingests/restores/replaces).
Option | Ingest or Export | Default Value | Description |
---|---|---|---|
| ingest-only | true | Tells the AIP ingester to automatically create any metadata fields which are found to be missing from the DSpace Metadata Registry. When 'true', this means as each AIP is ingested, new fields may be added to the DSpace Metadata Registry if they don't already exist. When 'false', an AIP ingest will fail if it encounters a metadata field that doesn't exist in the DSpace Metadata Registry. (NOTE: This will not create missing DSpace Metadata Schemas. If a schema is found to be missing, the ingest will always fail.) |
| export-only | defaults to exporting all Bundles | This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any bundles not included in AIPs will obviously be unable to be restored. This option can be run in two ways:
|
| ingest-only | Restore/Replace Mode defaults to 'false', | If 'true', the AIP ingester will ignore any Handle specified in the AIP itself, and instead create a new Handle during the ingest process (this is the default when running in Submit mode, using the |
| ingest-only | Restore/Replace Mode defaults to 'false', | If 'true', the AIP ingester will ignore any Parent object specified in the AIP itself, and instead ingest under a new Parent object (this is the default when running in Submit mode, using the |
| export-only | defaults to "all" | This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any bundles not included in AIPs will obviously be unable to be restored. This option expects a comma separated list of bundle names (e.g. "ORIGINAL,LICENSE,CC_LICENSE,METADATA"), or "all" if all bundles should be included. |
| both ingest and export | false | If 'true', the AIP Disseminator will only import/export a METS Manifest XML file (i.e. result will be an unzipped 'mets.xml' file), instead of a full AIP. This METS Manifest contains URI references to all content files, but does not contain any content files. This option is experimental and is meant for debugging purposes only. It should never be set to 'true' if you want to be able to restore content files. Again, please note that when you use this option, the final result will be an XML file, NOT the normal ZIP-based AIP format. |
| export-only | false | If 'true' (and the 'DSPACE-ROLES' crosswalk is enabled, see #AIP Metadata Dissemination Configurations), then the AIP Disseminator will export user password hashes (i.e. encrypted passwords) into Site AIP's METS Manifest. This would allow you to restore user's passwords from Site AIP. If 'false', then user password hashes are not stored in Site AIP, and passwords cannot be restored at a later time. |
| ingest-only | false | If 'true', ingestion will skip over any "Could not find a parent DSpaceObject" errors that are encountered during the ingestion process (Note: those errors will still be logged as "warning" messages in your DSpace log file). If you are performing a full site restore (or a restore of a larger Community/Collection hierarchy), you may encounter these errors if you have a larger number of Item mappings between Collections (i.e. Items which are mapped into several collections at once). When you are performing a recursive ingest, skipping these errors should not cause any problems. Once the missing parent object is ingested it will automatically restore the Item mapping that caused the error. For more information on this "Could not find a parent DSpaceObject" error see |
| export-only | unspecified | If 'skip', the AIP Disseminator will skip over any unauthorized Bundle or Bitstream encountered (i.e. it will not be added to the AIP). If 'zero', the AIP Disseminator will add a Zero-length "placeholder" file to the AIP when it encounters an unauthorized Bitstream. If unspecified (the default value), the AIP Disseminator will throw an error if an unauthorized Bundle or Bitstream is encountered. |
| export-only | unspecified | This option works as a basic form of "incremental backup". This option requires that an ISO-8601 date is specified. When specified, the AIP Disseminator will only export Item AIPs which have a last-modified date after the specified ISO-8601 date. This option has no affect on the export of Site, Community or Collection AIPs as DSpace does not record a last-modified date for Sites, Communities or Collections. For example, when this option is specified during a full-site export, the AIP Disseminator will export the Site AIP, all Community AIPs, all Collection AIPs, and only Item AIPs modified after that date and time. |
| both ingest and export | Export defaults to 'true', | If 'true', every METS file in AIP will be validated before ingesting or exporting. By default, DSpace will validate everything on export, but will skip validation during import. Validation on export will ensure that all exported AIPs properly conform to the METS profile (and will throw errors if any do not). Validation on import will ensure every METS file in every AIP is first validated before importing into DSpace (this will cause the ingestion processing to take longer, but tips on speeding it up can be found in the " |
81953672" section below). DSpace recommends minimally validating AIPs on export. Ideally, you should validate both on export and import, but import validation is disabled by default in order to increase the speed of AIP restores. |
How to use additional options
...
The following setting determines whether the AIP Ingester should create an EPerson (if necessary) when attempting to restore or ingest an Item whose Submitter cannot be located in the system. By default it is set to "false", as for AIPs the creation of EPeople (and Groups) is generally handled by the DSPACE-ROLES
crosswalk (see #AIP Metadata Dissemination Configurations for more info on DSPACE-ROLES
crosswalk.)
...
The below table lists common fixes to issues you may encounter when backing up or restoring objects using AIP Backup and Restore.
Issue / Error Message | How to Fix this Problem |
---|---|
Ingest/Restore Error: "Group Administrator already exists" | If you receive this problem, you are likely attempting to Restore an Entire Site, but are not running the command in Force Replace Mode ( |
Ingest/Restore Error: "Unknown Metadata Schema encountered (mycustomschema)" | If you receive this problem, one or more of your Items is using a custom metadata schema which DSpace is currently not aware of (in the example, the schema is named "mycustomschema"). Because DSpace AIPs do not contain enough details to recreate the missing Metadata Schema, you must create it manually via the DSpace Admin UI. Please note that you only need to create the Schema. You do not need to manually create all the fields belonging to that schema, as DSpace will do that for you as it restores each AIP. Once the schema is created in DSpace, re-run your restore command. DSpace will automatically re-create all fields belonging to that custom metadata schema as it restores each Item that uses that schema. |
Ingest Error: "Could not find a parent DSpaceObject referenced as 'xxx/xxx'" | When you encounter this error message it means that an object could not be ingested/restored as it belongs to a parent object which doesn't currently exist in your DSpace instance. During a full restore process, this error can be skipped over and treated as a warning by specifying the ' |
81953672). If you have a larger number of Items which are mapped to multiple Collections, the AIP Ingester will sometimes attempt to restore an item mapping before the Collection itself has been restored (thus throwing this error). Luckily, this is not anything to be concerned about. As soon as the Collection is restored, the Item Mapping which caused the error will also be automatically restored. So, if you encounter this error during a full restore, it is safe to bypass this error message using the ' | |
Submit Error: PSQLException: ERROR: duplicate key value violates unique constraint "handle_handle_key" | This error means that while submitting one or more AIPs, DSpace encountered a Handle conflict. This is a general error the may occur in DSpace if your Handle sequence has somehow become out-of-date. However, it's easy to fix. Just run the |