General principles for constructing Hydra objects
Hydra generally favours complex (atomistic) objects over compound (multi- content datastream) objects unless the content in all datastreams is identical but for, say, MIME-type or screen resolution or else where there is a requirement only for one content datastream (a special case of compound, sometimes referred to as a "simple" object). This has implications for most object classes: for instance, we take the view that because some ETDs (electronic theses and dissertations) may necessarily be complex (more than one datastream each having different content, for example a pdf + a multimedia file) then all ETD objects should be complex; a single datastream ETD is just a special case - an aggregation (parent) object with a single child.
As noted above, Hydra-compliant objects held in Fedora will be expected (by Fedora) to have a DC and a RELS-EXT datastream and (by Hydra) an enforceable rights statement, either in a rightsMetadata datastream (currently the most common pattern) and/or an Admin Policy Object (APO) by which it governed. In addition there will be more datastreams depending on the content type and purpose of the object. These structures are expressed and managed within the code using Ruby models. In our original plans (2008) we had envisaged that these structures would also be expressed in actual Fedora cModel objects in order that Hydra heads could use Fedora disseminators. However, many implementations have found that their use case(s) have no need for disseminators and so institutions have not created the actual cModel object or its attendant sDef or sDep. That said, these implementations generally declare the non-existent cModel in an object's RELS-EXT where the declaration statement is a useful hook that the UI can use to provide a view appropriate to the content type.
Where Fedora dissemination is used, the following applies:
Hydra-provided cModels, service definitions and service deployments will use Hydra's own namespace to distinguish them from other content that users may have, thus Hydra-provided object PIDs will begin:
- hydra-sDef: or
We come back to disseminators at the bottom of the page.
It is not perhaps widely enough known that the size of Fedora's (FOXML) objects can have a significant effect on server performance. Hydra strongly recommends that all metadata datastreams other than DC, RELS-EXT, and rightsMetadata should be of type 'managed' and not of type 'inline XML'. These three datastreams are singled out for inline use so that if a Fedora object is found somehow disassociated from its normal context it will contain in the core FOXML some essential information about itself.
Who stuck to the rules?
You know what they say, "rules are meant to be broken"? In producing production systems the Hydra partners may have interpreted and extended the rules and guidelines that follow. This page is already quite long enough with only scant reference to some of those adaptations. We are encouraging Hydra partners to document their implementations, adaptations and extensions here on the wiki. The parent page can be found here.
Example models for download
Hydra cModel/sDef/sDep objects are available from our github site at https://github.com/projecthydra.
Example Ruby models can be found on the Active Fedora github pages.
Compulsory datastreams and equivalents
All Hydra-compliant objects must express something about applicable rights. Who can view their content? Who can edit them? ...and so on. Hydra's original design required that each object should contain a datastream called "rightsMetadata" which expressed these matters in a simple XML structure. This is still the way that the majority of production systems work. The Hydra team developed a very simple Hydra rights metadata which can be found here. This rights metadata is easily indexed by Hydra and allows us to provide appropriate security around content and to provide gated discovery (users searching content through a Hydra head will only be aware of content that they would ultimately be allowed to download).
Newer Hydra implementations have developed the idea of an Admin Policy Object (APO) which can govern a set of objects. (Individual objects express their adherence to an APO using an isGovernedBy statement in their REL-EXT datastream.) The APO contains a statement of the rightsMetadata that each adhering object will conform to. On the one hand, using an APO has the benefit that any change of applicable rights may need only to be done in one place but, on the other, there is no explicit statement of rights in each object. APOs and object-level rightsMetadata datastreams can be combined: an APO could lay down a set of 'grant' statements which are then extended at the object level by further 'grant' statements.
APOs are dealt with more fully elsewhere.
Rights metadata schema
Fedora requires that all its objects have a DC metadata datastream. In the design of Fedora this was intended to be for administrative use only in order to facilitate searching for objects in a repository from the admin interface. It was not intended to be used for general descriptive metadata (although in the "real world" it is widely used that way). Hydra uses it in the way Fedora intended, as an admin tool, and generally partners keep a very minimal set of information there, perhaps just
- dc:title and
both required by Fedora, and maybe
- dc:creator and
An object's "real" descriptive metadata is kept in a different datastream usually called "descMetadata".
The RELS-EXT datastream is provided by Fedora primarily to record external relationships for an object. Hydra uses is for this purpose and can express a number of things there in addition to the standard Fedora entries.
As noted in "Hydra-compliant" above, we use the relationship "hasModel" to indicate the content type of our objects.
We use the relationship "isGovernedBy" to associate an object with an APO (see "Rights metadata" above).
We use the relationship "isPartOf" to associate child objects with their parent when using atomistic objects.
We use the relationship "isMemberOf" when defining management structures within a repository (not everyone does this).
We use the relationship "isDependentOf" when defining intellectual arrangements within a repository (again, not everyone does this).
For more information about sets see the section "Sets, sets or sets" below.
Optional metadata datastreams
In our original specification this was a compulsory datastream for descriptive metadata; as we moved to production we realised that there were classes of object that did not need it and so now it is optional. Clearly it is necessary in objects that can display a splash page.
The Hydra founding partners have each used a locally chosen subset of MODS here, appropriate to their use cases. There is no reason why other metadata schemas should not be used. Hydra users are known to be working with a range of other metadata formats including Dublin Core (DC), EAD and PBCore It is possible to convert between formats; thus, for instance, one Hydra partner keeps all descriptive metadata as MODS but can create a DC representation or a UKETD_DC one from it on the fly.
Whilst optional, this datastream might usefully be present in all objects that can display a splash page containing onward links. It is a 'one-stop-shop' for data related to this onward progress. Some of Hydra's founding partners have found it convenient in several Hydra heads.
This "contentMetadata" datastream could contain, say, a METS FileSec, a METS StructMap, an ORE map or a locally defined schema. The contentMetadata schema that the Hydra partners will use is based on one developed at Stanford ( StanfordContentMetadata.pdf for reference) which is being adopted elsewhere in slightly modified form (see for instance the Object cModels and datastreams.
Hydra contentMetadata schemas
There are many more datastreams that could be defined in a Hydra object, for instance:
and others you may create in response to need.
Datastreams for content
Initially, Hydra was quite prescriptive about the way datastreams should be named. In practice, implementers may or may not have followed the recommendations and so we are now more laid back about such things. But may we suggest a couple of guidelines?
We suggested (and if pushed would still recommend) that in a simple object with a single content-bearing datastream it should be called "content". Further, we'd suggest that every content-bearing object should have a "content" datastream; this so that a new Hydra head (perhaps imported from elsewhere) has somewhere to start. In a compound object (multiple content-bearing datastreams in the same object), the simple pattern would be content, content02, content03 etc.
Beyond that we might suggest that if an object has a thumbnail for display that datastream should be called "thumbnail" (again with a view to interoperability at a basic level). Beyond even that, the choice is yours. For reference, and/or general interest, these are some of the datastreams we originally tried to prescribe:
Simple content (pdf, etc)
- original (optional, perhaps the docx file from which the pdf is derived)
- thumbnail (optional, but if you have the cover image...)
General compound content
- content03 (optional)
- content.. (optional)
- thumbnail (optional)
- screen (say a 1024px version of the original)
- max (maximum deliverable resolution)
- original (optional, but useful if the original is a TIFF and you are delivering a jpg, say)
- content (optional, see above, might deliver any of the first three according to local thinking)
Sets, sets or sets?
It is not necessary to use sets in a Fedora repository, Hydra-based or otherwise. If you need to offer your end users some sort of structure it may well be enough to do this using facets in the discovery interface. That said, some people find sets useful for two purposes: providing a behind-the-scenes structure to aid management and/or for providing context around a collection of objects.
In the first case one might have a set for ETDs, within which sub-sets for individual subjects: this makes it easy to identify, say, all the biology ETDs in your repository for management. In the second case, you might have a collection of datasets all related to the same subject matter: discovered on its own each is of limited use, grouped together by a set object that explains their context and purpose they are much more useful.
It is possible that these two functions, management and provision of context, can be served by exactly the same set objects in which case the 'isMemberOf' relationship will answer all your needs. Equally, it may be useful to distinguish between the two approaches and talk about 'structural sets' and 'display sets'. In this scenario 'isMemberOf' is used for the structural sets and 'isDependentOf' for display sets and it is likely that your structural sets would be completely hidden from end users.
There are two basic patterns for managing "sets", Hydra's preferred name over "collections" or "folders".
Implicit set relationships in which the set object has no explicit listing but rather contains some rule(s) for identifying its set members.
Explicit set relationships in which the set object contains an explicit listing of its set members.
In all cases there must be a single object that represents the set itself in the repository: this object defines and describes the set (in the abstract and/or for specific UI use) and provides a reference point (a pid) for creating object associations to the set.
Doubtless people are developing additional ways of dealing with sets.
So what about disseminators?
As we noted near the top of the page, many Hydra implementations have found no need to use Fedora disseminators - but what about those that have?
If you want to make use of Fedora disseminators you first need to read the appropriate parts of the Fedora Content Modelling Architecture. The implications for Hydra are that, not only must you declare your content type as a cModel in RELS-EXT but the cModel object must exist in the Fedora repository - as must the corresponding sDef and sDep objects. Why go to this trouble? Well, for instance, disseminators provide a way of manipulating content on the fly. A color image could be delivered in monochrome, say, or a metadata datastream could be transformed to another format. A concrete example of this last case in use is that Hull keeps all descriptive metadata as MODS but can deliver it to the user expressed as, for example, DC: a disseminator applies an XSLT transform on the fly.
Granted, you could do XSLT transforms from within your Ruby code. It's all a matter of choice!