Old Release

This documentation covers an old version of Fedora. Looking for another version? See all documentation.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

To aid in ingest or to provide services for external content, Fedora 4 has the ability to expose that content as if it were included in the repository.  Particularly useful for migrating Fedora 3 content or serving large files already on disk.

Filesystem Federation

Filesystem federation maps a node in the repository to a directory on disk.  This allows files on disk to be served and updated by Fedora 4 as though they were in the repository.  Filesystem federation avoids having to transfer files using HTTP – and with larger file sizes (or with larger numbers of files being processed), this can improve performance significantly.  If you are ingesting a large number of multi-gigabyte files, we recommend you consider filesystem federation.

Another use for filesystem federation is interoperability with another system.  If you have files on disk managed by another application or workflow, you can use filesystem federation to serve them with Fedora 4 without having to ingest them using the REST API or create another copy of the files. 

Do property updates work with the modeshape filesystem federation connector?

Configuration

An example filesystem federation configuration to include in your Modeshape repository.json :

"externalSources" : {
    "federated-directory" : {
        "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
        "directoryPath" : "/path/to/files",
        "projections" : [ "default:/federated => /" ],
        "contentBasedSha1" : "false",
        "readOnly" : true,
        "extraPropertiesStorage" : "none"
   }
}
  • directoryPath base directory for all files shared with the repository.
  • projections lists one or more mappings from the repository to the filesystem.  The format is "{workspace}:{repository path} => {filesystem path}".

  • contentBasedSha1 controls how internal identifiers are computed for files.  By default (contentBaseSha1 = true), Modeshape computes the SHA-1 checksum of a file's content every time the file is accessed.  For small files this creates a modest overhead.  For large files, however, this dramatically reduces performance, since generating the checksum can take several seconds per gigabyte of data.  For this reason, we recommend setting contentBasedSha1 to false when serving files larger than 100MB.

  • readOnly controls whether the contents of the filesbase directory for all files shared with the repository.

  • extraPropertiesStorage sets the format for storing "extra" properties (properties that can't be set using filesystem attributes).  Recommended values are "json" for the current JSON properties format, or "none" for disabling extra properties.

Fedora 3 Federation

As discussed in Fedora 3 to 4 Upgrade, another use for federation is to include content from a Fedora 3 repository in Fedora 4.  This could be used as a temporary approach for migrating Fedora 3 content to Fedora 4.  Or it could be used on an ongoing basis, for example to interoperate with applications that require Fedora 3.

Configuration

A sample Fedora 3 connector configuration to include in your Modeshape repository.json :

 

"externalSources" : {
    "fedora3" : {
        "classname" : "org.fcrepo.connector.fedora3.Fedora3FederationConnector",
        "fedoraUrl" : "http://localhost:${servlet.port}/fedora",
        "projections" : [ "default:/f3 => /" ],
        "username" : "fedoraAdmin",
        "password" : "fedoraAdmin",
        "organizer" : {
            "classname" : "org.fcrepo.connector.fedora3.organizers.GroupingOrganizer",
            "maxContainerSize": 10
        }
    }
}
  • fedoraUrl contains the URL of the Fedora 3 repository.
  • projections maps a path of the Fedora 4 repository to the Fedora 3 repository.  The format is "{workspace}:{Fedora 4 path} => {Fedora 3 path}".
  • username and password are the credentials to use to connect to Fedora 3.
  • organizer specifies a mapping from the flat Fedora 3 structure to the Fedora 4 structure.  Because Fedora 4 performance can be degraded if there are too many children of a single node, we recommend setting maxContainerSize to 1000 or less.

Further Reading

Modeshape Federation Documentation:

Custom Connector References:

 

-------------------------------------

 

 

 New Modeshape FileSystemConnector option to compute hash from path rather than full content

 

The rate limiting step in using the FileSystemConnector for large files is the computation of the checksum used as the binary key of the files upon accessing them which can take on the order of hours for files in the 100 GB range. Alternatives were assessed, including using openssl library to create the checksums, using faster checksum algorithms (MD5), caching the computation, or an implementation of a checksum only using the beginning and ending bytes of the large file.  Ultimately it was decided to add a new option to the modeshape FileSystemConnector. As of modeshape 3.6.0 there is now an option contentBasedSha1 to compute a hash from the path string (option=false) rather than full content of the datastream (option=true) in repository.json external sources.  Default is true.

repository.json

 

"externalSources" : {
    "federated-directory" : {
        "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
        "directoryPath" : "a/path/here",
        "projections" : [ "default:/federated-directory => /" ],
        "extraPropertiesStorage" : "none",
        "contentBasedSha1" : "false"
    } ,

 

 

The tradeoff is that using contentBasedSha1=false results in the datastream not having a SHA1 hash.  If the full content hash is needed, yet performance is an issue consider using LargeFileSystemConnector which lazily computes a hash upon calling the getHash() and getHexHash() on the BinaryValue object, storing the hash but not using it as the binary key.


 

The ModeShape File System Connector, can project one or more file/folder hierarchies from the file system into the repository.

 

Within your fcrepo4 repository you'll find this file: fcrepo-webapp/src/main/resources/spring/repo.xml
Look for the repositoryConfiguration property, this will determine which repository.json file will be used to configure the repository.
 
All external sources must be defined within this file before startup, they can't be added after Fedora is already running.
 
This is an example configuration show a single source pointing to the /tmp directory, which then gets projected on the workspace default at /projection.
 
"externalSources" : {
        "system-tmp" : {
            "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
            "directoryPath" : "/tmp",
            "projections" : [ "default:/projection => /" ]
        }
}
 
You'll find all the options for this configuration here: https://docs.jboss.org/author/display/MODE/File+system+connector
 
The important ones are directoryPath which defines what file or folder is the external source and projections which defines all the projects for the given external source (more can be added at run time). 
 
The syntax of a projection is "workspace:/path/within/workspace => /path/on/filesystem"
 
Once you have added your external sources you can start fedora, and your /tmp directory should be available from the default workspace within the repository.


  • No labels