Learning Outcomes
- Understand the purpose of a repository
- Learn what Fedora can do for you
- Understand the key capabilities of the software
Course Outline
Introduction to Fedora 4
What is a Repository?
- Secure software that stores, preserves, and provides access to digital materials
- Supports complex semantic relationships between objects both within and outside the repository
- Supports millions of objects, both large and small
- Capable of interoperating with other applications and services
Fedora 4 Guiding Principles
- Improved performance, enhanced vertical and horizontal scalability
- More flexible storage options
- Features to accommodate research data management
- Better capabilities for participating in the world of linked open data
- An improved platform for developers—one that is easier to work with and which will attract a larger core of developers.
Exposing and Connecting Content with Fedora 4
- Flexible, extensible object modelling
- Atomic objects with semantic connections using standard ontologies
- RDF-based metadata using Linked Data
- RESTful API with native RDF response format
Core Components
Durable Storage
One of the core components of Fedora 4 is its long-term storage and preservation capability. A number of features support this capability; they have been grouped here under the notion of Durable Storage.
Fixity
- Over time, digital objects can become corrupt and unusable by suffering from bit rot and other digital preservation dangers
- Fixity checks help preserve digital objects by verifying their integrity using techniques such as checksumming
- On content ingest, Fedora can verify a user-provided checksum against the calculated value
- A checksum can be recalculated and compared at any time via a REST-API request
Backup and Restore
- A full backup, including all Datastreams as well as a compact serialization of all objects, can be performed at any time
- A full restore from a repository backup can be performed at any time
Export and Import
- A specific Fedora object, its children objects, and associated Datastreams can be exported
- The serialization of the Fedora object is more portable than the compact form found in the backup/restore feature
- Exported objects are serialized in a standard JCR/XML format
- An exported object or hierarchy of objects can be imported at any time
Versioning
- Versions can be created across the entire repository or on particular API calls.
- A previous version can be restored via the REST-API.
Policy-Driven Storage
- Different types of content can be routed to different back-end stores on ingest
- Policies can be written to route content based on properties (e.g. filetype)
Data Modelling
Nodes
- Both objects and datastreams are represented as nodes.
- Object nodes can have both Objects and Datastreams as children.
- The tree structure allows for inheritance of things like security policies.
Properties
- Nodes have a number of properties, which are expressed as RDF triples.
- The node itself is the implicit subject of each triple.
- Properties can be RDF literals (e.g. dc:title) or they can express relationships both internal and external to the repository.
- Any number of RDF namespaces can be defined and used.
Content Models
- Content can be modelled using Compact Node Definitions (CNDs).
- Mixins can be used to define any number of properties. A mixin can be added to a CND to be applied to objects.
- An object can inherit properties from any number of mixins; their effects are cumulative.
Linked Data
- Fedora 4.0 is compliant with the LDP 1.0 spec.
- Metadata can be represented as RDF triples that point to objects outside the repository.
- Many possibilities for exposing, importing, sharing resources with other web applications.
User Interface
Administrative Console
Tour of the HTML administrative interface.
Internal Search
- Internal search can search across all node properties.
- It also functions as a limited SPARQL endpoint.
External Components
Indexing
- Indexing repository content for external applications can be accomplished by using the JMS Message Consumer web application.
- This is just one possible implementation - different message consumer implementations could be written.
- The JMS Message Consumer receives JMS messages on repository updates and relays these messages to one or more external applications.
- Repository content needs to be assigned the rdf:type property "indexible" in order to be indexed.
Triplestore
- An external triplestore can be used to index the RDF triples of content managed by Fedora.
- Any triplestore that supports SPARQL-update can be used; Fuseki and Sesame have been tested.
External Search
- An external search application can be used to perform more complex search queries on repository content.
- Any search application that supports SPARQL-update can be used; Solr has been tested.
Authorization
- Authentication (not to be confused with authorization) is assumed to take place in a layer above the application.
- The authorization framework provides a plug-in point within the repository that calls out to an optional authorization enforcement module.
- Currently, two authorization implementations exist.
Basic Authorization
- Basic authorization compares the user's role(s) with an Access Control List (ACL) defined on a Fedora resource.
- ACLs can be inherited; if a given node does not have an associated ACL, Fedora will examine parent nodes until it finds one.
XACML Authorization
- XACML policies can provide much more complex and granular authorization.
- A default policy must be defined for the repository, and each node can override the default with another policy.
- A XACML policy referenced by a node will also apply to all the node's children, unless they define their own XACML policies that override the parent policy.
Performance
Transactions
- Multiple actions can be bundled together into a single repository event (transaction).
- Transactions offer performance benefits by cutting down on the number of times data is written to the repository filesystem (which tends to be the slowest action).
Clustering
- Two or more Fedora instances can be configured to work together in a cluster.
- Fedora 4 currently supports clustering for high-availability use cases.
- A load balancer can be setup in front of two or more Fedora instances to evenly distribute read requests across each instance.
- If one Fedora instance in the cluster goes down, read requests can be directed to the other instance.
- Ingests are replicated across all instances in the cluster.