Table of Contents
|Collection||Can hold...||Files attached to collections are used for thumbnails, collection documentation, etc.|
|Descriptive Metadata||Preference for using commonly-used ontologies (e.g. Dublin Core, SKOS, FOAF) with custom ontologies used only for the portion of the metadata that cannot be described in one of the commonly-used ontologies.|
|Technical Metadata||e.g. format/content type, size, checksum, create/modify dates, runtime/codec, etc.|
Reference: Hydra Works Application Profile
Features / Requirements
What you can do?
- Add a column for your project and indicate whether your project implements the feature / requirement.
- Add a row for each feature / requirement not already on the list.
- Update the Desired counter by one to indicate it is a high priority your project and you would like to see this feature added to Sufia.
Hydra @ Hull
|Another Project Name Here|
|software stack on which each project is based||Sufia 4.1.0||Hydra 6.1.0|
Uses can be assigned roles that control what they can see and do on the site.
Click to see summary table of roles and abilities...
This is an example of the roles and abilities that could be created in support of a configured workflow.
Assumes a more granular set of abilities beyond READ|EDIT access. Currently, access is associated with the Work. And if you have EDIT access to a work, you can perform all CRUD. If you have READ access, you can view only.
The site admin user is a highly trusted user with privileges to do anything on the entire site, potentially including but not limited to...
sufia: see status of async jobs, admin stats page, edit some content blocks, add featured works/researchers
Curate: can do anything any user can do in the gui, with any content
Hull: can do anything to content or collection but cannot see material under development in a depositor's private space ("the protoqueue"). Uniquely, an adminstrator can actually do a complete delete (lesser users' "delete" buttons move the object to a protected space available only to admins, who can potentially restore or delete the object).
The collection admin user is a highly trusted user with privileges to do anything in a specific collection, potentially including but not limited to...
sufia: same as Curate
Curate – What is called a collection today in Curate is a user generated list – the creator of the list is the administrator and can add or remove items from the collection. Anything discover-able can be added to the collection. Removing items from the collection does not remove the items from the repository. Membership in the collection does not confer additional or different rights to works and files.
A depositor can...
sufia: everyone as long as they log in
Hull: All material going into the repository proper is mediated by the Library "Content and Acceess Team" (previously our cataloguers). Depositors put their material into a QA queue for approval/checking by CAT before publication.
A delegate can...
Curate: A depositor can designate delegate(s) (proxies). Their delegate(s) can submit new works on the depositor's behalf, with the depository recognized as the owner of the work. As currently implemented, delegate(s) can also edit or delete any of the depositor's existing works.
An editor can...
Curate: An owner of a work can designate another user as an editor or that particular work. The editor can edit metadata and add files to that work.
Hull: this role is fulfilled by the Content and Access Team repository-wide.
A reviewer can...
Curate: no mediated deposit enforced by application.
Hull: this is the role fulfilled by the Content and Access team.
A viewer can...
A guest is anyone not logged into the system. A guest can...
|group of depositors||YES||group of users who can deposit in a specific Collection|
Avalon: For this, editors and reviewers -- Since is not a Self deposit IR, depository, editors, reviewers is based on collection. Content is not owned by individuals, but belong to collections and units.
|group of proxies||N/A||group of users who can serve as delegate/proxy of another user|
|group of editors||YES||YES||YES|
group of users who can edit a specific Work or Works in a specific Collection
Curate: A depositor can create a group that includes themselves and other users. By default the creator of the group is the administrator/owner of the group, but they can assign this role to a different member of the group. The depositor of a work can assign a group to that work, with the result that the members of the group are editors of the work.
|group of reviewers||YES||group of users who can publish/retract a specific Work or Works in a specific Collection|
|group of viewers||YES||group of users who can view a specific Work or Works in a specific Collection|
roles apply to
|+2||site admin applies to entire site||YES||YES||YES||YES||YES||YES|
|collection admin applies to a specific Collection||YES|
|+2||all other roles can apply to a specific Collection||NO||YES||Cincinnati (Curate): We have a request to be able to apply roles to a collection, allowing rights to cascade to works within the collection. But our implementation allows an individual work to be members of multiple collections; this may be the territory of admin sets, where a work should only be a member of one admin set.|
|all other roles can apply to a specific Work||NO|
define Descriptive Metadata fields
common for all Work types
The core metadata that is collected for all Work types (e.g. book, journal article, image, etc.). Typically this includes, title, creator, contributor, abstract, keyword, rights, publisher, date created, etc.)
Work types: Book, Journal Article, Image, etc.
sufia: metadata fields are hardcoded
specific to a Work type
Metadata fields that are applicable only for a specific Work type.
Book - number of pages, number of chapters, etc.Video - length (hh:mm:ss), animation (true|false), etc.
Work types: Book, Journal Article, Image, etc.
sufia: this will likely flow in via merger with Worthwhile (and Curate?)
specific to a Collection
Metadata fields that are applicable for a specific Collection. This includes fields unique to a particular collection and can be almost anything.
|NO||NO||NO||NO||NO||NO||Avalon - minimal collection level info.|
sections of metadata grouped intelligently
This functionality would allow metadata to be presented to the user with related metadata fields close to each other. For example, group all fields related to the creator, another group related to publication, another for Work type specific metadata, etc. The purpose of this is to aid the user in specifying values for metadata fields.
render sections of metadata only if needed (ex. book section only if doc type is book)
Hull has a generic metadata template as a default but has customised, grouped and tabbed pages for some content types. More are being developed as time allows.
file interrogation for Technical Metadata fields
File interrogation is used to automatically fill in technical metadata describing characteristics of the file.
This was originally listed as format which may have different interpretations. Mimetype is a well established concept with one interpretation.
sufia: uses FITS for file interrogation; not sure which of the following metadata FITS will extract
Curate: uses FITS for file interrogation
Avalon: Uses Media Info to extract (only format is stored)
|+1||file create date||NO|
|+1||file last modified date||YES||NO|
file interrogation of embedded Descriptive Metadata
All metadata in this section is extracted from embedded metadata.
|YES||NO||NO||NO||Sufia: FITS will do this, but embedded metadata can be awful, so we disable it in ScholarSphere|
|+2||title||YES||NO||NO||NO||Sufia: FITS will do this, but embedded metadata can be awful, so we disable it in ScholarSphere|
|+3||publication date||NO||NO||NO||sufia: FITS can extract last modified date of file, but that's the closest it provides|
|+3||language||YES||NO||NO||NO||Sufia: FITS will do this, but embedded metadata can be awful, so we disable it in ScholarSphere|
advanced file interrogation for Descriptive Metadata
full text extraction
|YES||YES||YES||NO||YES||can we do auto-tagging based on extracted full text|
OregonDigital: Text extraction doesn't do OCR (yet) - pulls from PDF and used for blacklight/viewer search.
Use a tool (e.g. Kea) combined with a controlled vocabulary to perform auto-tagging of keywords. Prefer options for keyword ranking, such as, frequency with boosting for appearance of the keyword in title, abstract, headings, etc.
|+2||multiple controlled vocabularies|
|YES||NO||NO||NO||sufia: will use QA for controlled vocabularies but currently has pre-QA code|
|+2||forms use auto-suggest from controlled vocabularies||YES||YES||NO||NO||NO|
|+2||multiple values from a controlled vocabulary in the same field||YES||YES||NO||NO||NO|
|+1||hierarchical controlled vocabularies||NO||LIMITED||NO||NO||NO||OregonDigital allows for (e.g. GeoNames Features hierarchy, but hierarchies are on a case by case basis; no generalized SKOS support.|
|+1||internally defined controlled vocabularies||NO||LIMITED (no editor)||NO||LIMITED||NO||system for defining controlled vocabularies hierarchies and values|
|+2||use external controlled vocabularies||NO||YES||NO||NO||NO||see Example External Controlled Vocabularies|
|+2||ability of user to write-in a value if not in the controlled vocabulary||YES||LIMITED||N/A||N/A||N/A|
sufia: we like to say that Sufia provides "authority suggestion" rather than authority control
Curate: Not applicable as we have no controlled vocabularies.
track versions of metadata change
Provide the ability to store versions of metadata and the UI to allow viewing or restoring of previous versions.
all fields tracked as a set with each save of metadata
single part Works
multiple part Works
worthwhile - has this; coming soon to Sufia via Hydra Works
Hull: an object can have multiple files, but not sub-objects
link to related
Provide a general ability to link a Work to another Work. In this case, the relationship between the two works is not identified.
Curate: you can relate a work to other works in the repository; you can also add an external link.
Hull: via a metadata entry
link by metadata field
A metadata field can be identified as a relationship to a Work and can be used to select a Work that fills that relationship. (Ex. Front Cover Image)
link by relationship
Use a controlled vocabulary to extend the general link to related functionality to include use of a controlled vocabulary to identify the type of relationship (Ex. errata)
Identification of high level work types, as opposed to file format types, allows for metadata fields to be displayed specific to that type. For example, both a book and a journal article may have file format type
|various high level text types (e.g. book, page, article, thesis, etc.)||YES||Each of these controls what metadata is displayed for metadata 'specific to a Work type'|
|various high level video types (e.g. movie, presentation, webcast, video clip, etc.)||LIMITED||Hull: we identify these high-level differences but have not yet differentiated the displays|
|various high level audio types (e.g. music, podcast, audio clip, etc.)||LIMITED||Hull: as above|
|various high level image types (e.g. diagram, illustration, photograph, etc.)||LIMITED||Hull: as above|
|+2||text (e.g., txt, docx, pdf, etc.)||YES||YES||YES||YES||NO||YES|
Curate: we have 'article' and 'document' as separate work types, both intended for text, but with different metadata.
Hull: we support all these but encourage PDF wherever possible.
|+1||video (e.g., mp#, qt, avi, etc.)||YES||YES||YES||YES||YES||YES|
sufia: no streaming
Curate: no streaming or custom players
|+2||audio (e.g., mp3, aif, wav, etc.)||YES||YES||YES||YES||YES||YES|
sufia: no streaming
Curate: no streaming or custom players
|+1||images (e.g., jpg, png, gif, etc.)||YES||YES||YES||YES||NO||YES||Curate: no IIIF|
Sufia: so much depends on what is assumed here. as with OregonDigital, any files may be uploaded, described, displayed, and downloaded
OregonDigital: They could go in as a file, but no explicit support.
Theses and dissertations are a special case of text file types. If Work types are implemented, then these will not need to be treated as a separate file type to have them use different metadata.
|YES||YES||NO||YES||Curate: Theses supported in Curate; temporarily turned off at Cincinnati; theses have different metadata than article or document.|
|+2||download||YES||YES||YES||YES||NO||YES||Sufia: whether the download or browser view action is triggered is determined by the interaction between the MIME type stored and sent by the server and what your browser supports.|
|+2||view in browser||YES||YES||YES||YES|
Curate: where format is understood by browser
|+2||streaming audio||NO||LIMITED||NO||YES||NO||Handoff to external streaming server (tied to IP address)|
|+2||streaming video||NO||LIMITED||NO||YES||NO||Handoff to external streaming server (tied to IP address)|
within system repository (default)
Sufia stores uploaded files in the Fedora Repository installed as part of the Sufia-Hydra stack.
|YES||YES||YES||YES||NO||YES||UCSD DAMS: DAMS Repository with Fedora 3 REST-API emulation|
This supports the ability to store large datasets and other large files outside of the Fedora Repository. Perhaps Fedora's support of external datastreams could be leveraged for this functionality.
Sufia: we're moving to Fedora 4 with a quickness partly in order to accommodate large files within the stack. there's no notion of external datastreams in Sufia.
UCSD DAMS: DAMS Repository supports multiple storage backends, including OpenStack
Curate: user can insert url for external file reference; no integration in GUI with Fedora external datastreams
track changes to an attached file
Provide the ability to store versions of an uploaded file and the UI to allow viewing or restoring of previous versions.
each file tracked separately with each file replacement vs. all files of a Work tracked as a set with each file replacement
Curate: currently versioning files but not metadata. Fedora may give us metadata versioning, but this is not exposed to depositors via Hydra.
Application of Policies
|settable on: Collection, Work, File|
policies inherit downward
most restrictive policy prevails
If a child overrides the inherited policies AND the child's policy is more restrictive, then the child's policy prevails.
If a Collection is marked public and a Work has no VISIBILITY FLAG property, it inherits the Collection's public mark.
Fail to Override Example:
If a Collection is marked private and a Work is marked public, it inherits the more restrictive Collection's private mark.
Override is Applied Example:
If a Collection is marked public and a Work is marked private, the more restrictive Work's private mark is used for that Work. The Collection remains public.
require login and role of viewer for the Collection/Work/File to be discoverable through search and viewed
Curate: private to me, unless I have assigned delegates (proxies) or editors, in which case my delegates and editors can discover my private works.
Hull: all items under development to a depositor are private to them.
|MODIFIER: private modifiers|
all users, who know the URL to the Collection/Work/File, can view it
Avalon: Can keep undiscoverable/hidden but available via URL to users with the right to view it.
Collection/Work/File metadata and full text is discoverable via public search, but the user cannot view the Collection/Work/File
Hull theoretically has this ability but we've never had a use case yet.
|+1||NO||NO||NO||NO||NO||NO||can view/print one page at a time|
|+1||NO||NO||NO||YES||YES||OregonDigital: We have "reviewed" and "unreviewed" - when ingested it's unreviewed, when reviewed it's public.|
|MODIFIER: published modifiers|
Sufia: anticipate this coming in via merger from Worthwhile and/or Curate
worthwhile - does this; coming soon (1st & 2nd qtr next year) to Sufia
Hull: currently no, but coming.
|+2||NO||YES||NO||NO||NO||worthwhile - does this; perhaps coming soon (1st & 2nd qtr next year) to Sufia|
|MODIFIER: unpublished modifiers|
UCSD DAMS: collection-level suppression
Hull: this is our "hidden" status - used in the event of copyright challenges etc.
user deposits for self only
|YES||NO||NO||YES||LIMITED||NO||Avalon: it depends on how you define this...Avalon has no sense of self, you can grant any user the right to deposit.|
user deposit as a proxy, as though they are another user
Sufia: added in 4.1.0
Curate: term 'delegate' is used.
|MODIFIER: on self/proxy-deposit|
Avalon: You could set it up permissions as certain roles can deposit but not publish, but it's system wide for all collections.
workflows (see Example Workflows)
|+1||admin user defined workflows||NO||NO||NO||NO||NO||NO|
|2 level hierarchy||YES||YES||YES||NO||YES|
sufia: uses this model (kind of, but I'm not sure I understand this row – files can be organized into collections and discovered that way, but files need not belong to collections. files are first-class objects just like collections are)
Avalon: Units and Collections
|NO||highest level is the application|
user defines 1 or more collections at any time (user must be logged in; users own collections they create)
Curate: a collection is a list and not an admin set (although at Cincinnati we need admin sets). A collection is itself collectible, and thus can be a member of a 'parent' collection, but we are not yet presenting that relationship-- when collections are listed, all collections appear in a flat list.
|+2||Flexible nested hierarchy||NO||YES||YES||NO||NO||YES|
Sufia: Coming Soon with Hydra Works. Each nested level is a Collection with a sub-Collection.
OregonDigital: We support this in the backend (if it's RDF we support it), but there's no UI for nesting.
Hull uses this primarily for internal structural organisation; we can also have nested display collections but it's rather crude as yet.
|highest level of system hierarchy (e.g. Cornell)|
|second level of system hierarchy (e.g. CUL-Mann)|
|third level of system hierarchy (e.g. Teeale); fourth level of system hierarchy (e.g. Gates Documents)|
|+2||site wide single brand||YES||YES||YES||YES||YES||YES||main landing page has Cornell brand|
|+1||multiple branding||NO||YES||NO||NO||NO||NO||different brand allowed at each system level with lower levels inheriting brand from parent level if unspecified; not really sure how branding is implemented, so inheritance may be a manual process|
Reporting / Notification
|+2||dashboard||LIMITED||NO||NO||NO||See Example Reports & Notifications|
Sufia: this is a documented need in ScholarSphere, so it could come to Sufia in the coming months
See Example Reports & Notifications
Avalon: sends an email regarding status of batches importing and imported.
sufia: export in 3 formats - endnote, zotero, mendeley (and we have a rake task to spit it all out as RDF/XML)
OregonDigital: can download metadata as ntriples from object URI
|+2||export files and metadata to external datastore||NO||NO||NO||NO||NO||OregonDigital: can export as BagIt bags|
1 Sufia - This column indicates whether Sufia implements this feature out of the box.
NOTE: See Potential Definitions of Workflows for full steps involved in each workflow.
|Desired||Workflow||Sufia||Oregon Digital||Hydra at Hull||Project 2 Name||Comments|
|create a work||YES|
Creates a Work instance with no Files
Hull: this is the first stage in creating an object. A file or files is then uploaded and associated with the object.
|upload a single file into an existing Work||Upload file; Create File instance; Associate File with existing Work|
|+1||upload a single file (no Work specified)||YES||Upload file; Create File instance and Work instance; Associate File with Work|
|+2||replace a single file||YES||Upload file; Update File instance|
|+1||batch add multiple files into an existing Work||YES||Upload files; Create File instance for each; Associate each File with existing Work|
|+1||batch add multiple files to multiple Works (no Work specified)||YES||Upload files; Create File instance for each; Create Work instance for each; Associate each File with one new Work|
|+1||batch add multiple files to one Work (no Work specified)||YES||Upload files; Create File instance for each; Create one Work instance; Associate each File with the one new Work|
|+1||add/edit metadata for a single Work||YES||YES|
|+1||batch add/edit metadata for multiple Works||LIMITED||NO|
descriptive metadata can be set in batch
Hull: no, but looking to add this very soon (weeks not months)
Changing Access Controls
|+1||publish a single Work||LIMITED||YES||sufia: any document marked public can be considered published|
|+1||retract a single Work||LIMITED||YES||sufia: for a public document, mark it private|
|+1||delete a single Work||YES||LIMITED||Hull: if a cataloger deletes an object it actually goes into a hidden (admin only) state. Only an admin can truly delete an object.|
|+1||batch publish multiple Works|
|+1||batch retract multiple Works|
|+1||batch delete multiple Works|
|+1||suppress a single Work||LIMITED||YES||sufia: make a public item private|
|+1||batch suppress multiple Works|
|+2||embargo a single Work||in worthwhile|
|+1||batch embargo multiple Works|
|+1||lease (temporary publish) a single Work|
|+1||batch lease multiple Works|
|+1||view metadata in search results||YES||YES|
|+1||view Work (file and metadata)||YES||YES|
|+1||export metadata||YES||NO||in 3 or 4 formats, end-note, zoterra, mendeley|
|+1||export files & metadata to external datastore||NO||NO|
sufia: all university logins are available
Hull: all university logins are available for viewing restricted content; only admins can add depositors at present.
|+2||assign user site-wide roles||LIMITED||sufia: admin only through code by being a member of a group|
|+1||assign users roles in a Collection|
|assign users roles for a Work|
|+1||add/edit collections / sub-collections||NO||YES|
|+1||add/edit metadata fields for a Work type||LIMITED|
e.g. set of fields for a Book, Journal Article, or Theses, etc.
sufia: one set of metadata fields configured in code
|add/edit metadata fields for a file type||e.g. set of fields for a png, mp3, or doc, etc.|
Example External Controlled Vocabularies
|Desired||Content||Sufia||Oregon Digital||Project 2 Name||Comments|
External Controlled Vocabularies
Example Reports & Notifications
|Desired||Content||Sufia||Oregon Digital||Project 2 Name||Comments|
|+2||Site wide counts|
|+1||Count of all sub-entities starting at a given level|
|+2||Count for a specific collection|
|+2||Count of Works in a particular state by site/entity/collection|
|+1||Work pending your approval||NO|
|+1||Summary of all Works pending an action by you||NO|
|+1||Summary of all Works with actions pending||NO|
Potential Definitions of Workflows
add a Work
batch add multiple Works
edit a Work
retract a Work – does this make it private (unpublished) again or is it more complex?