Date/Time: 5/18/2015, 12pm EDT

Participants

Discussion

  • Uses cases for filesystem persistence
    • Exit strategy
    • Disk-based workflows, such as bagging up repository objects and sending to Glacier or other preservation repository
  • Current software
    • Fedora 4 used Modeshape, which in turn uses Infinispan for persistence
      • By default, files are stored on disk, named after their SHA-1 checksum
    • Event-based listeners, including fcrepo-message-consumer (has module to save metadata to disk, or sync to triplestore, etc.)
      • Focus shifting to Camel (e.g., for Audit events), and it should be easy to implement a similar module in Camel

Questions for further discussion

  • Should we retrieve linked data resources and included them in exported metadata?  What are good strategies to avoid retrieving too much?
  • What do the files on disk look like exactly?
    • Path based on identifiers, or checksums, broken up using a PairTree approach?
    • What format should the metadata be in?  Allow configuration of RDF serialization?
    • Copy data files?  Link to them (symlinks for hardlinks)?
    • Several of us use Bags, should the files be in a Bag, or just easily Baggable?

UCSD Plans

  • We currently use ARK identifiers, and break them up into a PairTree when creating a filesystem structure.  So we would expect to use this approach for creating Fedora 4 repository paths, e.g. 

    http://localhost:8983/fedora/rest/ark:/20775/bb/12/34/56/7x

  • That object on disk would then be /path/to/ark:/20775/bb/12/34/56/7x

  • To avoid making multiple copies of large data files, we would use hardlinks to link to the files in Fedora/Modeshape/Infinispan storage
  • We have only created Bags when transmitting objects, but are open to storing the exported files on disk in Bags
  • We currently store metadata as RDF/XML, but it would be very easy to allow configuring other RDF serializations

 

  • No labels