Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated requirements for Query Service

...

Use cases and requirements

General requirements

  • we must be able to run Fedora 6 on Windows Servers

Query service

Use cases

  • if a query service is integrated into fedora, docuteam would like to use this service for the following queries:
    • total number of objects in a namespace (used to be PID namespace in Fedora3; equivalent to PID namespace in Fedora6 has to be defined; possible solutions: a) use toplevel objects to group objects by namespaces, b) use a rdf triple to assign a object to a pid namespace, c) use the Name Assigning Authority Number (NAAN) of ARKs to group objects)
    • get all available file formats in a fedora instance (we use PRONOM identifiers PUIDs to store this information) and to number of objects with a specific file format
    • get the total space used
    • get available free space for the persitent storage (optional as this could be solved by monitoring tools)
    • get the total space used by namespace (see above on what a namespace could be)
    • get all objects with a specific Triple (e.g. PRONOM identifier, some alternative identifier)

...

  • use existing standards for the query language (SPARQL comes and LDPath (https://marmotta.apache.org/ldpath/) come to mind)
  • should support aggregation functions (e.g. sum(), count())
  • should reuse existing indexes like Solr or triple store if present?
    → do not build another index, if these tools are present anyway

...

  • make optional?
  • will query service be part of the fedora API spec?
  • does it make sense to hardcode the supported queries?
  • only provide a search UI but actually use a triple store?

Answers/Remarks by Andrew Woods

  • Regarding "Query service", we have gotten consistent feedback that an integrated, synchronous search index should come with Fedora. The use case is for clients to create/update a Fedora resource, then immediately be able to query Fedora with the expectation that the resource be in the index. The externalized, asynchronous indices do not satisfy this use case.
  • We will certainly use the queries you have enumerated in the testing of the new query service.

Persistence

Use cases

  • For our cloud infrastructure, docuteam would like to use s3 compatible object storage (provided by Ceph https://ceph.io/)
  • Some docuteam clients might want to stick with a simpler storage model than OCFL to reduce storage requirements

...