Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt
The Fedora 4 Backup capability allows a user, such as the repository manager, make a REST call to have the repository binaries and metadata exported to the local file system. Inversely, the Restore capability allows a user to make a REST call to have the repository restored from the contents of a previous Backup operation. In addition, with the default configuration, files are stored on disk named according to their SHA1 digest, so a filesystem backup approach can also be used.
Panel
titleDesign Considerations

Historically, Fedora fulfilled its promise of durability by choosing transparent forms of persistence (e.g. human-readable XML) and using them in ways that systems outside the repository could readily penetrate if needed. Transparency in support of durability is as valid a principle as ever, but there is a weakness to it: transparent forms of persistence are not performant. What's more, many users didn't particularly care for that principle, but they were still stuck paying the performance costs associated with it. So in Fedora 4, we shifted responsibility for transparent persistence away from the core repository software. If you'd like to maintain some simple, human-readable form of your repository, that's fine, but you need to support that with an integration around the core. The form of persistence used by the core repository component itself is not meant to be manipulated directly by a human except in the most unusual situations, it's meant instead for use by the software to provide speedy service at the repository's API. You might compare this to the use of database software. You don't expect to directly manipulate database indexes, and if you are concerned for the durability of your data in the database, you take backups in a transparent format and use _those_ to ensure durability.

An analogy: you may expect your bank to provide downloadable images for any checks you write, but you don't expect them to use those images to run their accounting software. 

Usage

Backup

Warning

If a POST body specifying a writeable directory (local to Fedora 4 server) is not included in the request, the backup will be written to the system temp directory.

...

On success

  • HTTP/1.1 204 No Content

Configurations

...

Backup

...

  • Non-clustered Fedora, using Infinispan cache backed by LevelDB (config)

Backup Format

Regardless of the repository configuration, the output of the backup process creates resources of the same format. Further details on backup contents and the underlying implementation can be found in ModeShape's documentation.

...

By default, files larger than 4KB are stored on disk named after their SHA1 digest, in the directory fcrepo.binary.directory. (4KB is the default, but can be changed by updating the  minimumBinarySizeInBytes property in repository.json).  That is, a file with the SHA1 "a1b2c369563c0465ab82cdb2789d45ce1c3585b1" would be stored on disk in /path/to/fcrepo4-data/fcrepo.binary.directory/a1/b2/c3/a1b2c369563c0465ab82cdb2789d45ce1c3585b1.  So files in the repository can be backed up backing up the directory  fcrepo.binary.directory.

LevelDB Backup

LevelDB stores it's data as flat files in the directory fcrepo.ispn.repo.cache of the fcrepo home. The fcrepo home directory can be backed up as a whole to create a snapshot of the repository with both the binaries and the metadata. Though, the fcrepo.binary.directory and fcrepo.ispn.repo.cache are the only directories necessary for backup. (See ModeShape Artifacts Layout).  The backup can be created on a live repository without having to shutdown or restricting ingests to the repository. Though, it would be good idea to schedule the backups after any batch ingests, so that the newly ingested data is also included in the backup. 

Backup Strategies

Here are a few strategies for backup:

WITH SHUTTING DOWN FEDORA (CONSISTENTLY RELIABLE BACKUPS)

STEPS:

  1. Shutdown Fedora
  2. Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
  3. Restart Fedora

WITH PAUSING WRITES TO FEDORA

STEPS:

  1. Pause all updates to the repository.
    1. Do not create, delete, or update OBJECTS or DATASTREAMS.
  2. Wait for all previous update requests to be processed.
    1. For serialization of newly created objects to complete.
    2. And, for leveldb background compaction to complete (usually in seconds), if the previous updates triggered a compaction.
  3. Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
    1. Verify successful backup.
  4. Continue with the updates.

HOT BACKUPS (LESS RELIABLE UNLESS VERFIED)

  1. Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
    1. Verify successful backup.
Verifying Backups:
  1. Check leveldb cache store directory from the backup using a leveldb client.
    1. Verify the leveldb opens.
    2. Verify by iterating through the keys. (To expose any corruption)