Date: Friday, May 3, 2024 @12pm Eastern

Zoom Link: https://lyrasis.zoom.us/j/89136923989?pwd=trZNaSToJNJcNzPHXQNjAfbcaZWzFm.1


Discussion Topic:

Join us in a community conversation about the impact and use of archival groups vs. atomic resources on OCFL objects in Fedora 6.x. This open discussion is intended to help us understand use cases, identify best practices and ultimately help us update documentation to reflect current recommendations.

Attendees:

  • Demian Katz
  • Rosie Le Faive
  • Doron Shalvi
  • Jared Whiklo
  • John Gostick
  • Andrew Woods
  • Scott Prater
  • Thomas Bernhart
  • David Novak
  • Tom Wrobel
  • Simeon Warner
  • Ben Pennell
  • Yang Yu
  • Peter Winckles
  • Josh Westgard
  • Chad Mills
  • Kate Dohe
  • Nicole Scalessa
  • Seth Shaw

Desired Outcomes:

  • Use cases for both types
  • Positive experiences of using both types
  • Main differences in term of selling points
  • Reasons to avoid a particular type

Recording:

 

Notes:

Use cases: 

  • Villonova - currently using a mix.
    • Unaware of the distinction until after the migration.
      • Legacy data migrated as archival groups
      • New data is in atomic resources
    • Migrated data is in AGs while new things are Atomic, this works because older objects are more stable and newer ones are more volatile.
    • Has been helpful because legacy data is stable and “at rest”
    • Newly migrated data is somewhat still influx and this has been helpful because they can make modifications etc. to their new data without impacting
  • Cambridge - using Archival Groups to give more clarity on the underlying data when disconnected from Fedora.
    • Building from scratch
    • Considering a mixed use case in the future
    • Had an issue with a large object with manual versioning, when the first version was cut it left behind some objects in the mutable head.
    • Are aware that large numbers of versions could create a problem.
      • They are leaning on transactions to ensure that they don’t create too many versions.
      • Versioning - using Fedora as part of a managed work flow they are relying heavily on transactions to keep the versions to a minimum
  • Institute of Archaeology, CAS Prague - Using AC for primary storage because aiming for sustainability of underlying storage
    • Keeps things more tidy
    • Issue: when he makes a change in the Archival Group the last modified date does not change in the root of the Archival Group.
      • Because you don’t have one place to go to see all the last modified dates.
      • Would like to see the last modified date for the root Archival Group changed.
      • This was a specific decision to ensure the behaviour was the same between Atomic and Archival Groups.
  • National Library of Wales - chose Atomic resources to mimic more of the Fedora 3 structure.
    • Each object was simpler and so hopefully these were simpler to maintain. Each object has about 5 datastreams. There were also some metadata objects.
  • National Library of Medicine -  has a similar structure to NLW and they were thinking that AG more closely match the Fedora 3 layout.
    • The organisation is present at the on-disk layer.
  • Oxford, Bodelian Libraries -  thinking mix-and-match depending on the use case, but not in the same repository.
    • Like what they get at the OCLF layer because that is the basis of their preservation
    • If there was a performance advantage needed then they may consider atomic resources
    • They are using Fedora as a necessary hindrance to get their files into a OCFL structure on disk.
    • They don’t work with “warm” objects and so don’t generate a lot of versions, nor do their objects have many children.
      • Use transactions and using OCFL for only semi-stable states of data - first ingest, review and final
      • Directory traversal is slowing down back ups
    • Versions and directory depth is a big concern
      • Upwards of 7 depth and running in to running out of i-nodes
    • Wanted to be able to manage and define point in time versioning in one place 
      • Having access to this was important to them
      • Working on code to write previous versions of Fedora
  • Docuteam - does mix-and-match.
    • They initially didn’t use any AGs, but found they would get a lot of files on disk.
    • Now they create an AG for the major object (like the file, folder or document).
    • But the other objects don’t have child objects or features that might benefit from AG features.
  • UMD - was leaning towards AGs (not migrated from F4 yet).
    • One consideration is the WebAC when using hierarchy.
      • Was using a flat storage structure but lost the benefit of WebAC hierarchy.
      • Found that to be less performant, now moving to store objects in “collections” with the ACL at the collection level and children.
      • WebAC inheritance should be independent of the type

General Questions and Conversation:

  • Andrew Woods (Harvard, OCFL) one of the driving use cases is to have versioning at the intellectual object point. Archival Groups allow this use case.
  • Tom Wrobel, being able to get point-in-time level versioning of the intellectual object.
    • Having this feature is extremely attractive to have.
  • How long have AGs existed?
    • This construct is new with Fedora 6 to make use of the features of OCFL.
  • Thomas Bernhart - what really helped was to think about what made sense to have what you want to have contained together in the Fedora level.
    • Consider a book, you could have a reason to version each page independently, but you could also want to have each page included in a single intellectual object. This becomes important to be aware of how you are making changes to your object and how you make these changes.
  • John Gostick – with an AG when they index objects into Solr.
    • When you search for the object there is no fedora:parent relationship. So they use AGs to be able to pull the previous URL path to find its parent.
  • Tom Wrobel – there is an advantage from a development sense of knowing where they are stored on disk without having to make an additional call to find out.
  • Andrew Woods – some of the reasons for making an Atomic vs AG decision, are they making use of the transparency of the OCFL layer structure.
    • Tom Wrobel- The Bodelian have a suite of tools to work over the OCFL structure to perform tests and checks of the files, virus scanning and file checksums. This transparency allows them to use these tools in a much more clear way.
  • UWisc - using OCFL under a non-Fedora layer
    • They have a piece that works that is loosely coupled and stores objects in OCFL and that layer doesn’t know anything about Fedora.
  • Tom Wrobel – Because we have OCFL and AGs underneath. If Fedora 6 doesn’t work then we are in much better state for moving to another tool if necessary.
    • Fedora 6 is much more stable and is more performant than Fedora 4, but again if there is any problems then OCFL is there.
  • Andrew Woods – don’t use Fedora but do use OCFL and have a homegrown digital repository service and are replacing their existing application layer with a new application layer. Are hopeful that the underlying layer can be managed without changes.
  • Tom Wrobel - An API, access control, community support, documentation on top of OCFL. That was a selling point for us for Fedora 6


 OCFL-java Implementers Meeting Info: https://github.com/OCFL/ocfl-java/wiki

  • Meet quarterly (next meeting June 19 at 10am Eastern)

Next OCFL Community Meeting is Wed. May 8 at 11am Eastern - https://github.com/OCFL/spec/wiki/2024.05.08-Community-Meeting



  • No labels