Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current AIPs have too much interdependency. Parent objects (e.g. Collections) enumerate all of their children (e.g. Items). This means that every time a new child object (e.g. Item) is added/removed, it also must be added/removed from all of its parents' AIPs.

Based on discussions below, it looks like we currently have come up with 4 options (at least in the short-term). Feel free to add to these, if you think of other options or pros/cons:

  1. Allow Collections/Communities to enumerate their children (this is how the AIPs are currently formed in the prototype)
    • Pros
      • Makes partial-restores (restoring a Collection/Community) a bit easier – just restore the Collection/Community AIP and it then tells you what child AIPs are necessary to restore
    • Cons
      • Adding a new child object also changes the parent AIP. AIPs are not as independent.
  2. No enumeration of children in AIPs + local AIP parser
    • Pros
      • AIPs are independent.
      • Would work fine when restoring an entire site (or just a single item).
    • Cons
      • Local AIP parser is great as long as AIPs are stored locally. If the AIPs are actually stored elsewhere (whether in DuraCloud or in any other backup solution), then restoring a single Community or Collection is more complex. If the parser is local, then nearly all AIPs may need to be copied to local storage to be parsed – so that it could be determined if the AIP belongs to the Community or Collection being restored.
  3. No enumeration of children in AIPs + remote AIP parser (in DuraCloud, etc)
    • Pros
      • Same as #2 – in addition, now the remote parser can decide which AIPs need to be pulled down locally (so that you only need to copy the AIPs to local storage that you really need).
    • Cons
      • May be DuraCloud specific? Other backup solutions (to tape, external drive, offsite storage) may not be able to take advantage of an external parser.
  4. No enumeration of children in AIPs + a site "index" (which details all relationships)
    • Pros
      • Again, relatively simple partial-restore process (like #1) – In this scenario you just pull down the site "index" file to determine which AIPs are needed to fulfill the restore.
      • AIPs remain independent of one another
    • Cons
      • Could be semi-"proprietary" to DSpace? In other words, would other systems understand this file? But, do we care? If the AIP export is used by someone to migrate to another system, e.g. Fedora or similar, then they would likely be loading all AIPs, and have no usage for the "index" file in any case.
      • Although AIPs remain independent, any changes in relationships (e.g. adding a new object, moving an item) require updates to this "index" file as well – probably, not a big deal, but it's worth mentioning as well.

-------

Wiki Markup
*\[15 April 2010\]* We (Richard R, Bill H, Tim D) decided that child objects should enumerate their parents (so you can find an Item's parent Collection from that Item's AIP), but parents should *not* enumerate all their children.   Although this may make restoring content more complex (in order to restore a Collection, you need to look at each Item to determine if it is a child of that Collection), it will lessen inter-dependencies between 
Decision (on 15 April 2010): We (Richard R, Bill H, Tim D) decided that child objects should enumerate their parents (so you can find an Item's parent Collection from that Item's AIP), but parents should not enumerate all their children. Although this may make restoring content more complex (in order to restore a Collection, you need to look at each Item to determine if it is a child of that Collection), it will lessen inter-dependencies between
AIPs.

Wiki Markup
\-----\-
*\[16 April 2010 - Tim Donohue\]* I realized we may need to rethink this decision.  If there is no way to determine children of parents easily, than you may encounter the following less-than-ideal scenario when restoring a single Collection along with all its Items:

...

Wiki Markup
*\[01 June 2010 - Mark Wood\]* It's not necessary to parse entire Item AIPs since they are ZIP archives; just read the manifests.  If they are stored remotely (e.g. DuraCloud) then you need to be able to run the parser there and send back the lists of interesting items.

...