Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

However, there has been interest in providing at least some level of built-in search functionality to address basic discovery scenarios.  To guide planning and development, please provide concrete use cases your repository applications have for search that are not well-served by external search options.

List Repository URIS Based on Last Modification Time

Kevin Ford - 2017 Feb 6

There should be a way to request a list of repository resources based on last modification time.  The use case is as follows:

One overcast Chicago day, connectivity was lost between the server hosting Fedora and its embedded ActiveMQ broker and the server hosting Karaf/Camel subscribed to the broker’s ‘fedora’ topic.  The weather, though a strangely inserted detail in the preceding sentence, had no bearing on the loss in connectivity.  It just happened.

When connectivity was restored, Karaf/Camel of course re-subscribed to the ‘fedora’ topic, but only new messages were received.  Any messages published to the broker from the time connectivity was lost to the time connectivity was restored were not retrieved and processed by Karaf/Camel.  As a result, resources updated or modified during this period were not propagated to a search index (Solr) and a triplestore.  Regardless, relying on a JMS topic will invariably result in missed messages [1].  

Although it was possible to pinpoint the precise time network connectivity was lost, it is impossible to rectify this situation presently except to reindex the entire repository because it is not possible to query Fedora for a list of changed, modified, or deleted resources since a specific point in time.  In order to ensure the 2,000-5,000 resources created, modified, or deleted during the network loss were accurately reflected in the search index and triplestore, it was necessary to reindex 480,000 resources.

While it is possible to configure Fedora’s embedded ActiveMQ to use a queue instead of a topic, and to further expand the infrastructure to include a distributed broker [2], it seems reasonable that a repository be able to provide a list of URIs of created, modified, and/or deleted resources from a specific point in time.

Such a feature could also assist with auditing the contents of a repository when compared against the documents indexed by Solr and mirrored in a triplestore (assuming all resources have been propagated to those applications).  Were there Fedora messages that were not communicated and/or processed by Camel in the last 7 days and, if so, how many and what were they?

[1] https://jira.duraspace.org/browse/FCREPO-2005

[2] https://wiki.duraspace.org/display/FEDORA4x/Setup+Camel+Message+Integrations