Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

Title (Goal)Enhance the DSpace Curation System to Support a Flexible Query Tool
Primary ActorSystem | Human | External System
Story (A paragraph or two describing what happens)

The existing curation system provides a flexible framework for iterating over the repository hierarchy for queuing resource intensive tasks. Many valuable features could be incorporated into the curation system framework, but there are also some significant limitations.

  • The curation system does not accept user-input or parameters

  • Tasks that are placed in a queue have no ability to display results to an end user

  • Interactive curation tasks currently only display text output (no hyperlinks or interactivity)

  • Curation task output is not persisted

Ideally, the framework for a query system could be built off of the curation system if the curation system was expanded in the following manner.

  • Enable the curation system to accept parameters (JSON or XML)

  • Enable the curation system to persist task results whether run interactively or from the task queue

  • Enable the curation system to output html


  1. It would also be great to have some "community" curration tasks shared.  I don't have the time to learn how to build these or find them...

  2. This seems like more of an "enhancement / feature request" to existing functionality rather than a Use Case. Though, I honestly agree that these are all limitations of the Curation Tasks / System.

    But, in order to better understand the user/administrator needs here, I'm curious about what sort of things you'd like to do with Curation Tasks which are currently not possible?  Are there examples you can think of that speak to the limitations you've listed above?

    The other reason I ask is that, based on the Use Cases, it might be worth considering whether the current Curation System is actually the best solution, or if the Curation System itself may need to be either replaced or revamped to better align with common Use Cases related to "iterating over the repository hierarchy".

  3. The language listed above is definitely enhancement focused, but I believe that there are one or more real use cases here.  Consider the following potential use cases.

    1. As a collection administrator of a collection with complex metadata requirements, I can initiate a curation task that will produce a detailed report identifying specific errors within each metadata field of each item in my collection so that I can apply all metadata corrections in a single editing session.  Details: In order to address the issues that are reported, a detailed report (html) will need to be accessible from the curation task.
    2. As a repository administrator, I can initiate a curation task against the entire repository that will identify items that do not conform to repository policies so that I can audit the quality of items in the repository in.  Details: This is likely a performance intensive task that will need to be placed on a curation queue and run outside of the application server.  The results of the queued task will need to be persisted into a usable report.
    3. As a repository developer, I can design flexible curation tasks that utilize user parameters (date range, string match, regular expression match) to allow a collection manager to identify specific items of interest.


    1. Thanks for those examples, Terry. They definitely help to flesh this out into a use case.

      As a sidenote, I cannot help but notice that the examples you give are also various administrative "reports" (on content, metadata, etc).

      This almost makes me wonder if this use case as a whole is somewhat related to Admin UI - Collection Admin can construct a Quality Control report

      In other words, both seem to be asking for a better way to run Administrative reports / queries across metadata/items/content in general.  So, it might be worth us (developers / tech folks) starting to think about whether Curation Tasks should do this, or if there's a better way to run such Administrative reports in general. (I don't know the answer here, and don't expect you to, just "brainstorming" that both of these use cases seem loosely related, and may be describing something "new" that we might want to build into DSpace to make it easier to analyze/manage content in general)

      1. I agree. I have been looking at the same reporting problem from 2 different directions (the REST API and the Curation system).  

        Here are some thoughts that might identify when a curation task would be a preferred mechanism for a reporting function.  

        • The curation system is hierarchy aware.  Once a task is implemented, it is easy to enable it for an item, collection, community, or repository.
        • Curation tasks can be queued or they can run immediately.  Process-intensive reporting operations can be invoked via the curation system.
        • The curation system has already been configured to be accessible based on user roles.
        • The curation system (I believe) can be invoked using the credentials of the user who initiated the task.
        • Beyond reporting, for tasks that will perform modifications to objects, the curation system is likely to be a good interface.