Stakeholders

Sprints

Sprint 1

Sprinters

Developers

Testing and Validation

Documentation

Sprint 3

Sprinters

Developers
Testing and Validation
Documentation
- Joshua Westgard
- Nick Ruest

Sprint 4

2017-05 Import - Export Sprint 04 Meetings

Sprinters

Developers
Testing and Documentation

Use cases

Transfer between Fedora and external preservation systems, such as APTrust, MetaArchive, LOCKSS, DPN, Archivematica, etc
~~Package~~ [Export] the content of a single Fedora container and all its descendant resources
Transfer between fedora instances or (more generally) from Fedora to an LDP archive
~~load~~ [Import] ~~the contents of a package~~ into a specified container.
Round-tripping resources in Fedora in support of backup/restore
1. A start has been made on this in FCREPO-1990;
2. The implementation referenced in the above ticket is not dead, though not actively being worked on at the moment; pull requests welcomed (though others may well wish to take it in a different direction).
3. A rebuilder that:
  1. Is not solely dependent on a intact backup of the repository index
  2. Works off shredded serializations that can be supported with file preservation techniques
  3. Can recover as much as possible of a repository in the face of integrity issues (supports partial recovery)
  4. Supports gathering copies of the shreds (serializations) from multiple sources to recover a repository
Round-tripping resources in Fedora in support of Fedora repository version upgrades
~~Batch loading arbitrary sets of resources from metadata spreadsheet and binaries (may well be difficult – or not worth it – to try to generalize such a feature).~~
Import or export containers or binaries using add, overwrite, or delete operations. Configure the data model and the source and the target for each resource that will be updated. Allow target containers to be non-empty before import and source containers to be non-empty after export. Maintain ordering, etc. Support versioning. Examples: add issues to a publication; add fragments to a manuscript; add data sets to a longitudinal study; add time-series images from telescopes; remove resources determined to be under copyright; release resources after restrictions on access have expired.
1. Perform multiple metadata-only exports, and then restore an earlier version from an export.

Use cases yet to be rolled into requirements

Import objects from an external system (such as Figshare, where a research data object might be prepared) into a Fedora preservation repository with either Hydra or Islandora on top. (Implies compliance with Hydra and/or Islandora object models)
To migrate from internal content to external content, export metadata only and then import it into another repository. The links to the new external content locations would be added afterwards.

Requirements

External Systems

PHASE 2 Support import from and export to a TBD list of external systems.
1. APTrust - University of Maryland (Joshua Westgard)
2. Archivematica - Artefactual Systems (Justin Simpson)
3. MetaArchive - Penn State (Ben Goldman)
4. Perseids - Tufts - Bridget Almas

General

PHASE 1 Support transacting in RDF
PHASE 1 Support allowing the option to include Binaries
PHASE 1 Support references from exported resources to other exported resources
PHASE 2 Support transacting in BagIt bags
PHASE 1 Support import into a non-existing Fedora container
PHASE 2 Support import into an existing, empty Fedora container
PHASE 3 Support import into an existing, non-empty Fedora container with various policies: add, overwrite, delete, version, skip
PHASE 3 Support export of resource versions
PHASE 3 Support import of resource versions
PHASE 1 Support export of resource and its "members" based on the ldp:contains predicate
PHASE 2 Support export of resource and its "members" based on a user-provided membership predicate
~~Support recursive RDF insert/updates with LDP Indirect Container specified POST (and PUT / PATCH?) (ref: FCREPO-2042)~~

Round-tripping

Defined as: Export all or a subset of a Fedora repository and importing the export artifacts into a Fedora repository.

PHASE 3 Support preservation of dates during round-tripping
PHASE 3 Support preservation of version snapshots during round-tripping
PHASE 1 The URIs of the round-tripped resources must be the same as the original URIs
PHASE 3 Support lossless round-tripping. (ie, if you export a resource, delete that resource and import there is no difference from if you had never performed any of those operations).

BagIt

PHASE 2 Single resource bags
PHASE 2 The structure and scope of accepted and produced BagIt bags must be configurable (resource)
1. Clarification: structure relates to required and optional tagfiles in the bag
2. Clarification: scope relates to contents of the bag, e.g. single object or object and all members based on specific membership predicate
PHASE 3 Multi-resource bags
PHASE 3 Unambiguously support linking between resources within a bag, and from resources in the bag to resources outside the bag
1. e.g. for bagged resources A and B, if A contains statement <A> myns:rel <B>, then it is unambiguous that B is a resource in the bag. Suppose some archive ingests the bag and exposes its contents as web resources with URIs P and Q. If the archive preserves intra-bag links, resource P will have statement <P> myns:rel <Q>. Likewise, if A contains external link <A> myns:rel2 <http://example.org/outside/the/bag>, then an archive that preserves links will have <P> myns:rel2 <http://example.org/outside/the/bag>

Verification Tool

PHASE 2 Verify same number of resources on disk as in fcrepo
PHASE 2 Verify same number of resources in fcrepo as on disk
PHASE 2 Verify same checksum for binaries
PHASE 2 Verify same triples for containers
PHASE 2 Record which resources have been verified (Include checksum for binary resources)
PHASE 2 Verify subset of repository resources
PHASE 3 Verify fcrepo to fcrepo
PHASE 3 Verify disk to disk
PHASE 3 Use generated config file as sole input

Considerations

Import/export performance as is possible under the assumption that this work is done via the REST interface

Resources

https://tools.ietf.org/html/draft-kunze-bagit-08
https://github.com/ruebot/bagit-profiles
https://github.com/barmintor/bagit-ldp
https://www.ietf.org/archive/id/draft-wilper-semantic-content-pkgs-00.txt
http://dataconservancy.github.io/dc-packaging-spec/dc-packaging-spec-1.0.html (explanation below)
https://github.com/acdha/restful-bag-server (a resource-oriented RESTful HTTP API for exchanging bags)
Import - Export Sprint Resources

Meetings

Page tree

Stakeholders

Sprints

Sprint 1

Sprinters

Sprint 3

Sprinters

Sprint 4

Sprinters

28 Comments

Esmé Cowles

Christopher Johnson

Andrew Woods

A. Soroka

Christopher Johnson

A. Soroka

Martin Haye

Christopher Johnson

A. Soroka

A. Soroka

Esmé Cowles

Christopher Johnson

A. Soroka

Martin Haye

Aaron Birkland

Michael J. Giarlo

A. Soroka

Elliot Metsger

A. Soroka

Andrew Woods

A. Soroka

Esmé Cowles

A. Soroka

Nick Ruest

Aaron Birkland

Aaron Birkland

Youn Noh

Andrew Woods