Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • We recommend that you shut down Fedora 3 repository or put it into read-only mode, if you plan on migrating directly from Fedora 3 data on disk, to avoid issues of changes occurring while data is being written.
  • When performing a sizable migration, run the utility in a screen or tmux session that you can detach from.  Try not to shut down or reboot the host while the tool is running.
  • If the --working-dir isn't specified, the utility will use the current directory as the working directory, so an index directory and pid directory would be created in the current directory.
    • An index of the datastreams will automatically be created in <working_dir>/index, and then that index will be reused for future runs of the utility. If you need to update the index, or don't want it to be used for a new run of the utility, delete <working_dir>/index and the index will be re-created.
  • If the migration is interrupted, you can pick up where the migration left off by relaunching the tool with the --resume flag (keeping all the other parameters the same).
  • Redirect the output of the utility to a log file that you can analyze at your leisure during and after the migration.  Example:

    Code Block
    languagebash
    titleRedirect to log file
    java -jar migration-utils-6.0.0-driver.jar \
      --source-type=legacy \
      --target-dir=my-fcrepo-6-home \
      --objects-dir=my-fcrepo-3/objects \ 
      --datastreams-dir=my-fcrepo-3/datastreams > log.txt 2>&1


  • Errors:  the migration tool may encounter problems copying some objects or datastreams.  The tool by default will halt at the first error;  if you wish to migrate as much as possible then go back and address errors, run the tool with the --continue-on-error flag.  Objects or datastreams with errors will not be written to the OCFL repository;  they will be skipped, and the next object in the list will be processed.
    Objects that could not be migrated will provoke a stack dump in the log, marked with the string ERROR.  They can be extracted from the log at a later date and fixed, then migrated individually.
    Example:

    Code Block
    languagebash
    titleGrep ERROR
    $ grep ERROR log.txt
    ERROR 01:09:09.801 (Migrator) MIGRATION_FAILURE: pid="test:BadPID1", message="Unable to resolve internal ID "test:BadPID1+MYDS+MYDS.2"!"
    ERROR 01:29:54.878 (Migrator) MIGRATION_FAILURE: pid="test:BadPID23", message="Unable to resolve internal ID "test:BadPID23+MYDS+MYDS.0"!"
    ERROR 02:11:50.644 (Migrator) MIGRATION_FAILURE: pid="test:BadPID617", message="Unable to resolve internal ID "test:BadPID617+MYDS+MYDS.1"!"
    ...


  • Warning:  the migration utility is not idempotent!  This means that if you run the utility twice over the same content to the same target, you will wind up with an OCFL repository with duplicate versions of datastreams.  For the purposes of testing, delete or move out of the way previous migration attempts before running a new migration.
    However, you can migrate new objects to an already-existing OCFL repository, which means you can plan your migration in stages, if you desire.  Note that you will need to rebuild your Fedora 6 index after the new additions.
  • When adding new or fixed objects to the OCFL repository, make sure to regenerate a fresh datastream index.
  • As many files are created per object and per datastream version, make sure to allocate enough inodes on your system to allow for all the files (note:  this should only be necessary for

    extremely large repositories, more than 4 million objects, for example).
  • See Fedora 3 to 6 Migration Community Updates for examples of medium- and large-scale migrations performed with the migrations tool, with benchmark data and detailed notes.

...

Code Block
languagebash
titlemigration-utils usage
Usage: migration-utils [-chrVx] [--debug] -a=<targetDir> [-d=<f3DatastreamsDir>] [-e=<f3ExportedDir>] [-f=<f3hostname>] \
                       [-i=<indexDir>] [-l=<objectLimit>] [-m=<migrationType>] [-o=<f3ObjectsDir>] [-p=<pidFile>] -t=<f3SourceType> [-u=<user>] [-U=<userUri>]
    -h, --help 
        Show this help message and exit.
    -V, --version 
        Print version information and exit.
    -t, --source-type=<f3SourceType>
        Fedora 3 source type. Choices: akubra | legacy | exported
    -d, --datastreams-dir=<f3DatastreamsDir>
        Directory containing Fedora 3 datastreams (used with --source-type 'akubra' or 'legacy')
    -o, --objects-dir=<f3ObjectsDir>
        Directory containing Fedora 3 objects (used with --source-type 'akubra' or 'legacy')
    -e, --exported-dir=<f3ExportedDir>
        Directory containing Fedora 3 export (used with --source-type 'exported')
    -a, --target-dir=<targetDir>
        Directory where the migrated objects will be written
  	-i, --working-dir=<targetDir><workingDir>
        Directory where thesupporting migratedstate objects will be written (cached index of datastreams, ...)
      -I, --delete-inactive 
        Migrate objects and datastreams in the Inactive state as deleted. 
        Default: false.
    -m, --migration-type=<migrationType>
        Type of OCFL objects to migrate to. Choices: FEDORA_OCFL | PLAIN_OCFL
        Default: FEDORA_OCFL
    -l, --limit=<objectLimit> 
        Limit number of objects to be processed.
        Default: no limit
    -r, --resume 
        Resume from last successfully migrated Fedora 3 object
        Default: false
    -c, --continue-on-error
        Continue to next PID if an error occurs (instead of exiting). Disabled by default.
        Default: false
    -p, --pid-file=<pidFile> 
        PID file listing which Fedora 3 objects to migrate by default.
        Default: false
    -ip, --indexpid-dirfile=<indexDir><pidFile> 
        DirectoryPID wherefile cachedlisting indexwhich ofFedora datastreams3 (willobjects reuse index if already exists)to migrate
    -x, --extensions
        Add file extensions to migrated datastreams based on mimetype recorded in FOXML
        Default: false
    -f, --f3hostname=<f3hostname>
        Hostname of Fedora 3, used for replacing placeholder in 'E' and 'R' datastream URLs
        Default: fedora.info
    -u, --username=<user>
        The username to associate with all of the migrated resources.
        Default: fedoraAdmin
    -U, --user-uri=<userUri>
        The username URI to associate with all of the migrated resources.
        Default: info:fedora/fedoraAdmin
    --debug 
        Enables debug logging

...

Code Block
languagebash
java -jar migration-utils-<latest-version>-driver.jar \
  --source-type=legacy \
  --limit=100 \
  --target-dir=my-fcrepo-6-home \
  --working-dir=<tmp working dir> \
  --objects-dir=<path to objects dir> \
  --datastreams-dir=<path to datastreams dir>

...