Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Design Principles

  1. Driving principles:Minimizing Minimize change to the user via the API
  2. Do not allow OCFL-isms not from bleeding into Fedora API
  3. Rebuildability
  4. Compliance with OCFL
  5. Retain URLs of migrated Fedora resources
  6. Performance
  7. Reducing Reduce complexity of implementation

OCFL persistence

Architecture

  1. Retaining HTTP layer of existing Fedora codebase
  2. Replacing ModeShape persistence with OCFL storage
  3. Support for three interaction models: 
    1. atomistic (implicit) - every LDP resource maps to an individual OCFL Object
    2. archive group - hierarchy of LDP resources map into a compound OCFL Object
    3. archival-part (implicit) - an LDP resource that is a constituent part of a compound OCFL Object
  4. Eliminate "single-subject-restriction", i.e. support arbitrary RDF
  5. Fedora-specific information to be stored in the OCFL Object in a ".fcrepo/" directory
    1. i.e. Which file is the description of another file
    2. i.e. While file is an ACL
  6. Optimizing reads/lookups with an internal database
    1. proposed database model: https://docs.google.com/document/d/1MsMfhae3thmNdoFtnTUnII3mr_-OkllRs9PvgnY1fDY/edit

...

  1. Support for both OCFL objectsstorage hierarchies:
    1. created by Fedorapre
    2. -existing, created by another application

Pre-existing OCFL storage hierarchy

  1. Fedora-imposed constraints
    1. The OCFL storage hierarchy must have a single, consistent "ocfl_layout" (i.e. the storage path mapping algorithm must be determinant(pre-existing)

Mapping between LDP and OCFL

Opt-in model

  1. Fedora resources may be created with an optional "archive" interaction model provided via headers.  

  2. New resources created via POST or PUT to the archive will be LDP contained by the archive and will be stored within the OCFL object representing that archive.

  3. If a resource is created without the "archive" model, new resources created via POST or PUT will be LDP contained by the parent resource, but will be stored as separate OCFL objects.

Notes/Implications

  1. Note: At creation time, user establishes interaction model at creation time. Changing the model would require additional migration tooling.

Fedora-specific details

  1. /content/.fcrepo directory
  2. Hashing (SHA256) on LDP path of resource

Scaling

  • stateless Fedora instance(s) will scale horizontally
  • database can be clustered and/or moved to cloud  database service(RDS, Aurora, etc)
  • start with file system, scale out to  cloud object store (s3)

Bulk ingest

  1. Faster ingest rates can be achieved by users writing OCFL-compliant content directly to disk
    1. Would require Fedora to (re)scan OCFL storage hierarchy
  2. Optionally, user could write into OCFL-compliant storage in a way that includes Fedora optimizations (e.g. ".fcrepo/" directory)

Performance

  1. Many members: performance should improve significantly since list of members will be supplied by a database index (which should support a degree of in-memory caching).  No loading of modeshape nodes required.

Open questions

  1. Role of OCFL storage roots
    1. Could be valuable for multi-tenancy, but client interaction model has not been detailed
  2. What is the mapping / algorithm / relationship between:
    1. Fedora URL of LDP resource
    2. OCFL Object.ID
    3. OCFL storage path for associated OCFL Object

...

  1. Provide new implementation of fcrepo-kernel-api that interacts with OCFL persistence
  2. Interactions with OCFL persistence should initially take advantage of the JHU OCFL client
  3. For pre-existing OCFL storage hierarchies, Fedora-imposes the following constraint:
    1. The OCFL storage hierarchy must have a single, consistent "ocfl_layout" (i.e. the storage path mapping algorithm must be determinant)
  4. Many members: performance should improve significantly since list of members will be supplied by a database index (which should support a degree of in-memory caching)

Prototyping proposal

  1. Expose JHU OCFL client functionality with minimal HTTP endpoints
    1. Such an endpoint should implement minimal LDP interactions
  2. Use HTTP over OCFL to test:
    1. Performance bottlenecks
    2. Scale viability (e.g. NLM migration)
    3. User expectations, ergonomics

...

  1. Same as Fedora 4 and 5 version creation: POST to a resource's "/fcr:versions" endpoint to create a Momento (i.e. a new OCFL version directory)
  2. Actively edited objects not captured in a "cache/" directory at the sibling-level with OCFL version directories

...

  1. Same code logic used for creation of OCFL versions / Mementos in both on-demand and on-change models
  2. LDP resources within a compound object should respond with a "Link" header pointing the the TimeMap of the Fedora "archival-group" resource
  3. POST on /fcr:versions of part resources returns a 400 response

  4. GET on /fcr:versions returns a version of the "constituent part"

Migration from lower versions of Fedora to higher

...

  1. Index of all Fedora resources would be needed to support the query service
  2. Messaging model (synchronous or asynchronous) would likely be used to populate the index
  3. Full-text search would be a bonus

Transaction service

...

  1. Proposal: no change to the Fedora API spec in 6

...

  1. We will either:
    1. align code with the (as-yet-to-be-ratified) side-car specification
    2. leave HTTP API unchanged while introducing the possibility of auto-versioning on transaction completion

...

  1. Potentially store updates within a transaction in a "txn/" directory at the sibling-level with OCFL version directories
  2. Support actions on multiple OCFL objects within a single transaction
  3. Deleting tombstone of OCFL Object purges the Object

  4. Deleting tombstone of "constituent part" is not supported (405)

Raw notes

  1. General VA Beach Meeting notes
  2. Design summary notes
  3. Migration notes
  4. Object modeling notes
  5. Versioning notes
  6. Fixity notes
  7. Bulk ingest notes
  8. Query service notes