Brief outline of specifications to validate a Fedora 3 to OCFL migration. The validation tool should begin with coarser validations, then progressively handle finer-grained validations.
TBD: specify the validations to be performed as a runtime parameter (list)?
To be determined: validation of external and redirect datastreams?
Objects
Validate: number of objects
Valid: number of objects in the OCFL repository is equal to the number of objects in the Fedora 3 repository.
Validate: object IDs
Valid: every object in the OCFL repository has the same ID as its corresponding object in the Fedora 3 repository.
Object Content
Validate: object metadata
Valid: The HEAD version of the OCFL object.nt
metadata (HEAD determined from the object's top-level inventory.json
manifest) matches the current version of the Fedora 3 object metadata
- lastModifiedDate
- createdDate
- ownerId
- label
- state
Note that Fedora 3 content models will be verified as part of the examination of the RELS-EXT datastream (lower-level validation).
Validate: list of datastreams
Valid: every datastream listed in the object's top-level (HEAD) inventory.json
manifest matches the list of current (HEAD) version of the datastreams in the Fedora 3 repository.
Datastream Content
Validate: datastream metadata
Valid: The HEAD version of the OCFL <DSID>.nt
metadata (HEAD determined from the object's top-level inventory.json
manifest) matches the current version of the Fedora 3<DSID> metadata
- messageDigest
- size
- mimeType
- state
- title
- identifier (DSID)
- lastModified
- created
Validate: datastream size
Valid: the size of the HEAD version of the datastream in OCFL matches the size of the datastream on disk in the Fedora 3 repository
Valid: the size of the HEAD version of the datastream in OCFL recorded in <DSID>.nt
matches the size of the OCFL file on disk
Validate: datastream checksum
Valid: the algorithm type and checksum value of the datastream recorded in the HEAD version of OCFL <DSID>.nt
metadata file matches the type and value of the checksum for the datastream in the Fedora 3 repository
Valid: the checksum of the HEAD version of the datastream file in OCFL matches the checksum value recorded in the corresponding <DSID>.nt
metadata file, when calculated using the algorithm recorded in the metadata file
Versions
Validate: object versions
To the best of my knowledge, Fedora 3 does not version changes to object metadata. Suggestions for validating changes to object metadata, as opposed to datastream versions, are welcome.
Validate: number of datastream versions
Valid: number of datastream versions for a datastream in an object in OCFL matches the number of versions of the same datastream in the same object in the Fedora 3 repository
Validate: datastream versions
Valid: each datastream version in the OCFL object has a corresponding version in the Fedora 3 repository, as determined by matching the datastream creationDate recorded in the OCFL version <DSID>.nt
file to a datastream version creationDate in the Fedora 3 repository
Validate: datastream version content
Perform the validations listed above under "Datastream Content" on each version of a datastream in OCFL.