Overview
The Fedora 4 Backup capability allows a user, such as the repository manager, make a REST call to have the repository binaries and metadata exported to the local file system. Inversely, the Restore capability allows a user to make a REST call to have the repository restored from the contents of a previous Backup operation. In addition, with the default configuration, files are stored on disk named according to their SHA1 digest, so a filesystem backup approach can also be used.Usage
Backup
If a POST body specifying a writeable directory (local to Fedora 4 server) is not included in the request, the backup will be written to the system temp directory.
Perform a backup of a running Fedora 4 repository
Request
POST /rest/fcr:backup
> optional POST body
Response
On success
- HTTP/1.1 200 OK
- Path where the backup was written
Response body
- Absolute path of local backup directory
Restore
Note: Restoring a backup replaces the repository content with the contents of the backup, so any data in the repository will be lost.
Perform a restore of a running Fedora 4 repository
Request
POST /rest/fcr:restore
> with POST body
A POST body containing the full path to a previous backup.
Response
On success
- HTTP/1.1 204 No Content
Configurations
The following configurations have been successfully tested with the Backup and Restore functionality
- Non-clustered Fedora, using Infinispan cache backed by LevelDB (config)
Backup Format
Regardless of the repository configuration, the output of the backup process creates resources of the same format. Further details on backup contents and the underlying implementation can be found in ModeShape's documentation.
The backup directory will contain
- 'binaries' directory that contains the repository "content" binaries stored in a pair-tree like structure. The filename of the binary is the SHA-1 of the content with the extension '.bin'. The directory structure in which each binary is found is three levels deep based on the SHA-1.
For example, binary content in the repository with a SHA-1 of "5613537644c4d081c1dc3530fdb1a2fe843e570170d3d054", would look like
├── binaries └── 44 └── c4 └── d0 └── 44c4d081c1dc3530fdb1a2fe843e570170d3d054.bin
- One or more "documents_00000n.bin.gz" files which contains a concatenated listing of the metadata for each of the repository objects in JSON format
For example
{ "metadata" : { "id" : "87a0a8c317f1e7/jcr:system/jcr:nodeTypes/nt:unstructured//undefined/1" , "contentType" : "application/json" } , "content" : { "key" : "87a0a8c317f1e7/jcr:system/jcr:nodeTypes/nt:unstructured//undefined/1" , "parent" : "87a0a8c317f1e7/jcr:system/jcr:nodeTypes/nt:unstructured" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "nt:propertyDefinition" } , "onParentVersion" : "COPY" , "multiple" : false , "protected" : false , "availableQueryOperators" : [ "jcr.operator.equal.to" , "jcr.operator.greater.than" , "jcr.operator.greater.than.or.equal.to" , "jcr.operator.less.than" , "jcr.operator.less.than.or.equal.to" , "jcr.operator.like" , "jcr.operator.not.equal.to" ] , "requiredType" : "UNDEFINED" , "mandatory" : false , "autoCreated" : false } } } }
Filesystem Backup
By default, files larger than 4KB are stored on disk named after their SHA1 digest, in the directory fcrepo.binary.directory
. (4KB is the default, but can be changed by updating the minimumBinarySizeInBytes
property in repository.json). That is, a file with the SHA1 "a1b2c369563c0465ab82cdb2789d45ce1c3585b1" would be stored on disk in /path/to/fcrepo4-data/fcrepo.binary.directory/a1/b2/c3/a1b2c369563c0465ab82cdb2789d45ce1c3585b1
. So files in the repository can be backed up backing up the directory fcrepo.binary.directory
.
LevelDB Backup
LevelDB stores it's data as flat files in the directory fcrepo.ispn.repo.cache
of the fcrepo home. The fcrepo home directory can be backed up as a whole to create a snapshot of the repository with both the binaries and the metadata. Though, the fcrepo.binary.directory and fcrepo.ispn.repo.cache
are the only directories necessary for backup. (See ModeShape Artifacts Layout). The backup can be created on a live repository without having to shutdown or restricting ingests to the repository. Though, it would be good idea to schedule the backups after any batch ingests, so that the newly ingested data is also included in the backup.
Backup Strategies
Here are a few strategies for backup:
WITH SHUTTING DOWN FEDORA (CONSISTENTLY RELIABLE BACKUPS)
STEPS:
- Shutdown Fedora
- Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
- Restart Fedora
WITH PAUSING WRITES TO FEDORA
STEPS:
- Pause all updates to the repository.
- Do not create, delete, or update OBJECTS or DATASTREAMS.
- Wait for all previous update requests to be processed.
- For serialization of newly created objects to complete.
- And, for leveldb background compaction to complete (usually in seconds), if the previous updates triggered a compaction.
- Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
- Verify successful backup.
- Continue with the updates.
HOT BACKUPS (LESS RELIABLE UNLESS VERFIED)
- Backup of FCREPO HOME (or just fcrepo.binary.directory and fcrepo.ispn.repo.cache)
- Verify successful backup.
Verifying Backups:
- Check leveldb cache store directory from the backup using a leveldb client.
- Verify the leveldb opens.
- Verify by iterating through the keys. (To expose any corruption)
Based on the flow of the background compaction process in the leveldb implementation ([1] and [2]), the manifest file is updated at the end of compaction, which is followed by the deletion of obsolete files. To verify successful backups, we can begin the backup with copying the manifest file followed by the rest of the files. And, at the end of the backup, the backed up manifest file can be compared with the current manifest. An unchanged manifest can be considered as a successful backup, and vice versa.
[1] https://github.com/google/leveldb/blob/master/db/db_impl.cc#L655
[2] https://github.com/google/leveldb/blob/master/db/version_set.cc#L811
The following script can be used to perform hot backup with verification (requires repair and verify scripts):
The following script can be used to verify a leveldb cache for corruptions:
Repairing Corrupt LevelDB
When the LevelDB database becomes corrupted, the RepairDB option provided by the LevelDB API can be used to recover as much as data as possible. In LevelDB, the manifest file holds account of all files and their corresponding key ranges. The recovery process inspects each file in the leveldb directory and updates the manifest accordingly. This implies that even with a successful repair missing-files could lead to loss of data, which in turn can prevent a successful restoration of the repository.
The below script can be used to repair corrupt leveldb cache: