Archived / Obsolete Documentation

Documentation in this space is no longer accurate.
Looking for official DSpace documentation? See all documentation

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Notes/Discussion on the current AIPBackupRestorePrototype Implementation

Crosswalking in AIPs

Although using METS based AIPs seems like a nice, neutral format (which could allow you to easily move content to other systems), is it the "best" format for a DSpace AIP?

Richard R has some concerns that the roundtrip crosswalking could end up being lossy. So, in a normal backup & restore, we'd go through two crosswalks:

  1. Export = Crosswalk all DSpace objects into a METS-based representation
  2. Restore = Crosswalk a METS-based representation back into a DSpace Object

If the crosswalks are not kept in sync, the final restored DSpace Object may not be the same as the initial DSpace Object. This becomes even more problematic for institutions which have created their own custom metadata fields, bundles, etc. If the crosswalks don't understand how to deal with that content, it's possible some of it could be lost during the restore process.

Perhaps in the end, we need to determine if there's a better way to serialize this DSpace content. Or, maybe you can "choose your serializer" and decide whether you'd rather serialize your AIPs using METS or a different packaging format (TBD).

Inter-dependencies between AIPs

Current AIPs have too much interdependency. Parent objects (e.g. Collections) enumerate all of their children (e.g. Items). This means that every time a new child object (e.g. Item) is added/removed, it also must be added/removed from all of its parents' AIPs.

Decision: We (Richard R, Bill H, Tim D) decided that child objects should enumerate their parents (so you can find an Item's parent Collection from that Item's AIP), but parents should not enumerate all their children. Although this may make restoring content more complex (in order to restore a Collection, you need to look at each Item to determine if it is a child of that Collection), it will lessen inter-dependencies between AIPs.

What content goes in an AIP?

Several questions about what content should really be stored in an AIP.

Does AIP include derivatives (e.g. thumbnails, extracted text files) or just DSpace CONTENT Bundle?

Decision: We need to have a includeBundle and excludeBundle option on the AIP generation process. That way, individual institutions can choose which derivatives (or other content) they feel should be in AIPs. By default, we will just export the CONTENT and LICENSE bundles.

Is there a way to link to common CC licenses (and similar repeated content), and avoid storing them within AIPs?

Although this is a nice concept – it will require further investigation. Obviously, if you have the same CC License attached to 300 items, it'd be better to have those 300 Item AIPs to link to a single CC License file (to save on storage space), rather than repeating that CC License in all 300 Item AIPs.

Do we store EPeople, Groups and ResourcePolicies in AIPs?

Tim feels that ResourcePolicies (DSpace access rights) might need to be stored eventually in some way. Otherwise, when restoring an Item, you'll always have to default to making it publicly available (or default to Collection access rights)

Do we store Embargos in AIPs, so that we can restore them as needed?

These are currently stored in metadata in 1.6. So, in a way, we're already storing them by default.

However, perhaps the ingest process for an AIP will need to check for embargoes if we feel they are worth restoring.

Restore from AIPs

We need to review the current restore process more closely, to ensure it is doing what we expect it to do. Previously it was built to support the AipPrototype (with internal AIPs).

  • No labels