Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

NLM

Observations

  1. External datastreams.  Most of our binaries are of type E external.  The migration tool migrates the Fedora objects, but not the type E external binaries (as expected).  Thus we are left with object structure, metadata and RDF in OCFL format, but not the actual binaries themselves.  If, how, when and where to migrate external binaries to an OCFL structure is TBD, but a major consideration for us in adopting OCFL.
  2. Speed. The tool migrates objects at the rate of 15K-40K objects per hour. This should be manageable for our purpose.
  3. For the "citations" repository, it consistently takes 30 minutes to build the datastream index before starting the migration. This server has 3.8M managed datastreams (1 per object). The option to cache this index when resuming migrations is helpful.
  4. CPU time. Consumes about 30%.
  5. Layout. In flat and pairtree migrations the PID is used to form the path; for example PID nlm:nlmuid-101588995-bk (stored FOXML file name nlm_nlmuid-101588995-bk) becomes /ocfl/nl/m+/nl/mu/id/-1/01/58/89/95/-b/k/5-bk. Characters such as – are problematic in Linux.  See 
    Jira
    serverDuraSpace JIRA
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
    keyFCREPO-3180
    .
  6. It would be nice to declare use of another field, or input map, to dictate the value to use for layout path generation. For example, it may be nice to use 101588995_bk to generate a path for PID nlm:nlmuid-101588995-bk.  Also included in 
    Jira
    serverDuraSpace JIRA
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
    keyFCREPO-3180
    .
  7. Migrated datastreams have no file extension. It would be nice if migrated datastreams have a file extension inferred from the MIME type; e.g. DC.xml instead of just DC, and OCR.txt instead of just OCR. This should particularly help out with in-line XML datastreams.  
    Jira
    serverDuraSpace JIRA
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
    keyFCREPO-3181
  8. OCFL versions appear to be created based on datastream timestamps. Each unique timestamp creates a new OCFL version, even if they were part of the same Fedora version in the AUDIT trail and differed only by milliseconds.
  9. Add XML declarations for migrated in-line datastreams.  
    Jira
    serverDuraSpace JIRA
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
    keyFCREPO-3197

...