Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated requirements for Query Service


Use cases and requirements

General requirements

  • we must be able to run Fedora 6 on Windows Servers

Query service

Use cases

  • if a query service is integrated into fedora, docuteam would like to use this service for the following queries:
    • total number of objects in a namespace (used to be PID namespace in Fedora3; equivalent to PID namespace in Fedora6 has to be defined; possible solutions: a) use toplevel objects to group objects by namespaces, b) use a rdf triple to assign a object to a pid namespace, c) use the Name Assigning Authority Number (NAAN) of ARKs to group objects)
    • get all available file formats in a fedora instance (we use PRONOM identifiers PUIDs to store this information) and to number of objects with a specific file format
    • get the total space used
    • get available free space for the persitent storage (optional as this could be solved by monitoring tools)
    • get the total space used by namespace (see above on what a namespace could be)
    • get all objects with a specific Triple (e.g. PRONOM identifier, some alternative identifier)


  • use existing standards for the query language (SPARQL comes and LDPath ( come to mind)
  • should support aggregation functions (e.g. sum(), count())
  • should reuse existing indexes like Solr or triple store if present?
    → do not build another index, if these tools are present anyway


Currently it's rather unclear for us, how this query service would be integrated into fedora and what are the use cases that such a service would support. We are skeptic that a query service would support all our use cases and that it adds an additional index for information, that would already be present in a triple store and/or Solr index. Therefore we think such a query service should be either an optional component that can be installed along Fedora6 (similiar to a fixity service) or be able to use existing technologies like a triple store or Solr/Lucene index as a backend.

However we would see a use case, to extend the existing simple web UI of Fedora5 with a possibility to search for objects (similiar to the search interface that what was available in fedora3 when accessing the endpoint /fedora/objects). As mentioned above, such a UI could leverage existing indexes.

  • make optional?
  • if integrated into fedora: make optional
  • possibility to query persistence layer for used storage and free storage
  • will query service be part of the fedora API spec?
  • does it make sense to hardcode the supported queries?
  • only provide a search UI but actually use a triple store?

Answers/Remarks by Andrew Woods

  • Regarding "Query service", we have gotten consistent feedback that an integrated, synchronous search index should come with Fedora. The use a real SPARQL endpoint in the background and only deliver a UI/API?case is for clients to create/update a Fedora resource, then immediately be able to query Fedora with the expectation that the resource be in the index. The externalized, asynchronous indices do not satisfy this use case.
  • We will certainly use the queries you have enumerated in the testing of the new query service.


Use cases

  • For our cloud infrastructure, docuteam would like to use s3 compatible object storage (provided by Ceph
  • Some docuteam clients might want to stick with a simpler storage model than OCFL to reduce storage requirements
