...
Use cases and requirements
Query service
Use cases
- if a query service is integrated into fedora, docuteam would like to use this service for the following queries:
- total number of objects in a namespace (used to be PID namespace in Fedora3; equivalent to PID namespace in Fedora6 has to be defined; possible solutions: a) use toplevel objects to group objects by namespaces, b) use a rdf triple to assign a object to a pid namespace, c) use the Name Assigning Authority Number (NAAN) of ARKs to group objects)
- get all available file formats in a fedora instance (we use PRONOM identifiers PUIDs to store this information) and to number of objects with a specific file format
- get the total space used
- get available free space for the persitent storage (optional as this could be solved by monitoring tools)
- get the total space used by namespace (see above on what a namespace could be)
- get all objects with a specific Triple (e.g. PRONOM identifier, some alternative identifier)
Requirements
...
- use existing standards for the query language (SPARQL comes to mind)
- should support aggregation functions (e.g. sum(), count())
- should reuse existing indexes like Solr or triple store if present?
→ do not build another index, if these tools are present anyway
Questions/Comments
- if integrated into fedora: make optional
- possibility to query persistence layer for used storage and free storage
Questions
- will query service be part of the fedora API spec?
- does it make sense to hardcode the supported queries?
- use a real SPARQL endpoint in the background ? and only deliver a UI/API?
Persistence
Use cases
- For our cloud infrastructure, docuteam would like to use s3 compatible object storage (provided by Ceph https://ceph.io/)
- Some docuteam clients might want to stick with a simpler storage model than OCFL to reduce storage requirements
Requirements
- OCFL backend: native support to use s3 compatible object storage
- alternative “simple” persistence implementation (optional?)
Migration
Use cases
- We Wtih the switch to Fedora6, docuteam wants to switch to a ontology-based data model.
- Docuteam would like to leverage the functionality of a generic migration utility.
- Docuteam would like to migrate Fedora 3 XML-Datastreams into triples.
Requirements
- possibility to select/configure which Fedora3 datastreams should be converted to Fedora6 binaries
- possibility to extent migration utilities with custom parsers/functionality to create triples
Documentation
Requirements
- Step by step installation manual for production use
- Recommendations for storage systems (e.g. WORM, Object Storage, generic NAS)
...