...
Use cases and requirements
General requirements
- we must be able to run Fedora 6 on Windows Servers
Query service
Use cases
- if a query service is integrated into fedora, docuteam would like to use this service for the following queries:
- total number of objects in a namespace (used to be PID namespace in Fedora3; equivalent to PID namespace in Fedora6 has to be defined; possible solutions: a) use toplevel objects to group objects by namespaces, b) use a rdf triple to assign a object to a pid namespace, c) use the Name Assigning Authority Number (NAAN) of ARKs to group objects)
- get all available file formats in a fedora instance (we use PRONOM identifiers PUIDs to store this information) and to number of objects with a specific file format
- get the total space used
- get available free space for the persitent storage (optional as this could be solved by monitoring tools)
- get the total space used by namespace (see above on what a namespace could be)
- get all objects with a specific Triple (e.g. PRONOM identifier, some alternative identifier)
...
- use existing standards for the query language (SPARQL comes and LDPath (https://marmotta.apache.org/ldpath/) come to mind)
- should support aggregation functions (e.g. sum(), count())
- should reuse existing indexes like Solr or triple store if present?
→ do not build another index, if these tools are present anyway
Questions/Comments
Currently it's rather unclear for us, how this query service would be integrated into fedora and what are the use cases that such a service would support. We are skeptic that a query service would support all our use cases and that it adds an additional index for information, that would already be present in a triple store and/or Solr index. Therefore we think such a query service should be either an optional component that can be installed along Fedora6 (similiar to a fixity service) or be able to use existing technologies like a triple store or Solr/Lucene index as a backend.
However we would see a use case, to extend the existing simple web UI of Fedora5 with a possibility to search for objects (similiar to the search interface that what was available in fedora3 when accessing the endpoint /fedora/objects
). As mentioned above, such a UI could leverage existing indexes.
- make optional?
- if integrated into fedora: make optional
- possibility to query persistence layer for used storage and free storage
- will query service be part of the fedora API spec?
- does it make sense to hardcode the supported queries?
- only provide a search UI but actually use a triple store?
Answers/Remarks by Andrew Woods
- Regarding "Query service", we have gotten consistent feedback that an integrated, synchronous search index should come with Fedora. The use a real SPARQL endpoint in the background and only deliver a UI/API?case is for clients to create/update a Fedora resource, then immediately be able to query Fedora with the expectation that the resource be in the index. The externalized, asynchronous indices do not satisfy this use case.
- We will certainly use the queries you have enumerated in the testing of the new query service.
Persistence
Use cases
- For our cloud infrastructure, docuteam would like to use s3 compatible object storage (provided by Ceph https://ceph.io/)
- Some docuteam clients might want to stick with a simpler storage model than OCFL to reduce storage requirements
...
- Thomas Bernhart to deliver sample data to Andrew Woods by August 16th.
- Thomas Bernhart to deliver use cases, and requirements to Andrew Woods by September 6th.
...