This document describes how to review for Fedora 3.x content prior to migration. Once you are ready to migrate, please see the migration guide on the wiki.

More general guidelines for migrations can be found in the NDSA Migration Checklist.

Exporting Content From Fedora 3

Before beginning a migration from Fedora 3 to Fedora 6 it is useful to export a representative sample of Fedora 3 content in order to review the content types, metadata, and datastreams used in the repository. This can be accomplished using the fedora-export command line utility, which has been documented on the wiki. If you know the PIDs of the objects you wish to export you can specify them in the parameters of the command (note that you will need to execute the command multiple times in this case; once per PID). Alternatively, you can export the entire contents of the repository using a single command. The ‘archive’ context should be used in order to export each object along with its datastream content. 


Reviewing Fedora 3 Content

Content exported from Fedora 3 can be explored by navigating the folder structure. Different Fedora repositories use different data models, but the basic structural composition of each object is defined by the Fedora Object Model. This model is expressed in the FOXML file (the filename should end with “-foxml.xml”). This file can be opened with a text editor, ideally one that supports syntax highlighting so the file will be easier to read. 

FOXML files have several sections that describe the object and its datastreams.

Section

Description

foxml:digitalObject

Information on the FOXML version and the PID of the object.

foxml:objectProperties

Properties that describe the object itself, e.g. a label, the creation date, the last modified date.

foxml:datastream

Information on each datastream contained by the object. The type of datastream is defined by its control group: X, M, E, or R.

foxml:datastream CONTROL_GROUP="X”

Internal XML Content - the content is stored as XML in-line within the digital object XML file

foxml:datastream CONTROL_GROUP="M”

Managed Content - the content is stored in the repository and the digital object XML maintains an internal identifier that can be used to retrieve the content from storage

foxml:datastream CONTROL_GROUP="E”

Externally Referenced Content - the content is stored outside the repository and the digital object XML maintains a URL that can be dereferenced by the repository to retrieve the content from a remote location.

foxml:datastream CONTROL_GROUP="R”

Redirect Referenced Content - the content is stored outside the repository and the digital object XML maintains a URL that is used to redirect the client when an access request is made.

Assuming the exported Fedora 3 objects are representative of the content in the repository, the datastreams in the FOXML can be reviewed to establish the content types and metadata used within the repository. Each datastream has an identifier (ID) and a description (LABEL) that can be used to help determine the purpose and function of each datastream if this information is not already known.

Content Types

Content types are defined by content models, which are referenced using the RELS-EXT datastream. This datastream uses CONTROL_GROUP="X”, meaning it is stored as in-line XML within the FOXML. Any content models associated with an object are referenced using the ‘hasModel’ relationship within the RELS-EXT datastream. Models can be created and defined by each repository so there is no central index of content models, but they are often labeled in a descriptive manner and the model itself can be retrieved by its PID if more information is required.

As a best practice prior to migration, content models should be updated to include any missing information about the datastreams and mimetypes; these descriptions should be specific and useful. Any Document Type Definitions (DTDs), XML Schema Definitions (XSDs), ontologies, etc. that are used in datastreams can also be created and stored as Fedora objects in order to better self-document the repository.

Metadata

Fedora is very flexible with regard to metadata, so there may be many different types and schemas used in a given repository. Metadata may be stored as inline XML within the FOXML (CONTROL_GROUP="X”) or in a separate, managed file (CONTROL_GROUP="M”). In rarer cases metadata may be stored externally as an external (CONTROL_GROUP="E”) or redirect (CONTROL_GROUP="R”) file. 

Every Fedora object has a default set of descriptive metadata stored inline in the FOXML using the Dublin Core schema. This metadata can be found by locating the datastream with ID="DC". Other metadata datastreams may exist but these are defined by each repository - the datastream IDs and LABELs should indicate which other datastreams contain metadata. If this metadata is based on a defined schema this should be referenced within the datastream definition, but custom metadata without a schema may also be present.

Binary Files

Object datastreams often contain binary files; these may be images associated with a photograph, PDFs associated with a thesis, or any number of other file types - Fedora simply stores whatever binary file it is given without restricting filetypes or attempting to parse the file. The datastream ID and LABEL for each file should help determine what it is, but these files should also be present in the exported object directory.

Summarizing and Mapping Content

The contents of a Fedora 3 repository can be summarized and represented in many ways, but for the purposes of a migration a spreadsheet would be useful. This spreadsheet could have several tabs; one that lists each content type and the datastreams it should contain, another that lists each metadata type and schema, and perhaps another that lists each field used in the metadata. From there, decisions can be made regarding mapping metadata vs. retaining the existing structure.


Migrating Content to Fedora 6

Fedora 3 repositories that do not use Islandora or Samvera (for which there are separate tools) should use the migration-utils tool as a starting point for migrating content to Fedora 6. By default, there is a 1:1 correspondence between Fedora 3 objects and Fedora 6 objects. Fedora 3 datastreams appear as files within the resulting Fedora 6 objects. This may be suitable for cases where remapping metadata to new schemas/fields (e.g. moving from XML to RDF) is not required. Please see the tool documentation on the GitHub page for more information.

To begin, it is best to start with a small set of sample objects to migrate and then review. These objects can be exported from Fedora 3 ahead of time, or a list of PIDs can be passed to the migration tool. In the latter case, the objects will be copied directly from a Fedora 3 repository, which must be running at the time.

Reviewing Migrated Content

The migrated objects will be available in a directory specified in the command parameters, where they can be reviewed in the same way as exported Fedora 3 content. Reviewing the objects in this way is a good way to ensure the FOXML and datastream content has been faithfully migrated, and there are no obvious errors. After this initial inspection, the objects should be explored through the Fedora 6.0 REST interface, either on the command line or using the HTML interface.