Page tree
Skip to end of metadata
Go to start of metadata
Title (Goal) Support a ESIP Best Practices compliant OpenSearch API
Primary Actor Developer
Scope

The primary federated data discovery interface in the Earth Sciences internationally is a two-step OpenSearch.  It would be optimal if any instance of Fedora that held such data could support discovery and access accordingly, so that users of any of the existing and developing search engines that are OpenSearch based can automatically find and support queries for those data (as well as all the other data in other non-Fedora repositories that they support - e.g., NASA, NOAA, ESRI geonetwork, CEOS, GEOSS, etc.).

Level 
Author Ruth Duerr

Story:

The user story for this support follows:  As a user of a particular Earth Science focused search interface or data access tool (e.g., GEOSS broker, CWIC interface, etc.), I want all relevant Earth Science data to show up no matter where it is located. 

The  standard for this in the Earth Sciences is the two step OpenSearch Request as defined by ESIP Best Practices (see specification list below).  In short, a client begins by submitting a query (based off a Collection level OpenSearch Description Document (OSDD)) for Collections meeting a given set of criteria.  After assessment by the user of the resulting Collections, additional queries for Data Items (based off an Item level OSDD for each Collection) that meet those or additional query criteria are submitted.  The results returned may include 1 or more web services that provide additional access support for a Data Item (e.g., OGC W*S, OPeNDAP, etc.).   It should be noted that Collection and Data Item are content model objects in the Data Conservancy Data Model.

A list of specifications and best practices for such an API follows:

http://www.opensearch.org/Specifications/OpenSearch/1.1

http://www.opensearch.org/Specifications/OpenSearch/Extensions/Geo/1.0/Draft_2

http://www.opensearch.org/Specifications/OpenSearch/Extensions/Time/1.0/Draft_1

http://www.opensearch.org/Specifications/OpenSearch/Extensions/Parameter/1.0

http://wiki.esipfed.org/images/9/97/Combined_Open_Search_Best_Practices_v0.4.pdf

http://www.opensearch.org/Specifications/OpenSearch/Extensions/Relevance/1.0/Draft_1

http://www.opengeospatial.org/standards/opensearchgeo

http://ceos.org/wp-content/uploads/2014/12/CEOSOpenSearchBestPracticeDocument-PublicComment.pdf

Fortunately there is a validator for this type of API available at http://testbed.echo.nasa.gov/cwic-smart/validations

It should be noted if all specifications are followed, and the OSDD's are publicly available, Collections and Data Items housed within any publicly accessible Fedora instance will be discoverable and accessible through any number of existing user interfaces, as well as whatever interfaces are developed in the future.

7 Comments

  1. This isn't an extension of a Fedora repository, it is an extension of a search index updated from a Fedora repository. That doesn't mean that it's not appropriate to talk about it in the context of repository extensions, but that any reasonable implementation of this functionality isn't going to be built over a repository using the eventual repository extension standard, but over some search index.

  2. Well actually most complex object-based searches and retrievals will have to have some way to use an index of some sort...  But yes it is true that the index will need to tied into the framework so that the objects can actually be accessed as well as so that the indices can be built!

  3. There's already a recommended construction for building indexes available. I don't understand what that has to do with this API extension framework.

  4. Ruth Duerr I'm trying to express this use case in a way that can articulate the role of the extension architecture itself vs external services/indexes provide, but don't know the mentioned standards all that well.  

     A. Soroka is right that there are already recommended constructions for building indexes from repository content, but I'm wondering about the "OGC, W*S, OPeNDAP" part.  Do you envision, say, providing an OPeNDAP extension which exposes the content certain objects in the repository according to the OPeNDAP protocol?  In other words, an external search service (which may be OpenSearch compliant) is populated with objects from the repository that have an  OPeNDAP representation mediated via the extension architecture, and exposed at a URI such as /path/to/object/geo:OPeNDAP.  So when the object /path/to/object is created or updated, the OpenSearch service indexes the content exposed at its OPeNDAP URI.  

    Is this along the lines of what you are envisioning?

    1. Yes, though actually OPeNDAP and Webification really make things look like they are in ftp or web directory trees, even when they aren't....

  5. These were some of my thoughts on this use case.  I haven't read the specifications, but one of the thought experiments in my head is: how would the repository support, say, spatial search?  And is the API Extensions architecture the place to enable it?

    Questions or Comments

    1. Are open search endpoints exposed at the collection level?  Or do we need to support exposing OpenSearch at a higher level, e.g. “repository-wide”?  Will the OpenSearch endpoint be bound to an object? Does Fedora have the concept of a “root” object, and can we bind services to it?

    2. What would an API Extension need to do in order to advertise and respond to spatial queries?

    Roles

    API Extension Architecture

    1. Expose OpenSearch endpoints on collections containing data amenable to earth science data (e.g. /path/to/object/os:query).  Includes accepting and processing standard OpenSearch query parameters

    2. Specialized indexing routines and management for supporting earth science OpenSearch queries (e.g. an ingest hook, invoked when ingest events are emitted from Fedora).  Index management isn’t something that is user-facing in any way, so will the API Extension architecture support the packaging, deployment, and configuration of such a module?

    Fedora

    1. Answers requests for resources as normal.

    2. Emits events associated with object lifecycles (e.g. ‘ingest’, ‘tombstone’, etc).

    Developer

    1. Responsible for implementing OpenSearch capabilities in an API Extension.

    2. Responsible for implementing the indexing needs of earth science-related Open Search

    1. I am wondering about the indexes from repository content part - since the indexes I am talking about are for complex objects like a whole data collection or a data item comprised of 5 videos that together make a whole.

      I note that OpenSearch has a few standard query parameters (keywords, time, and space); but also the ability to support additional parameters as defined by the developer/user as well...

      Granule/item level search endpoints are exposed at the collection level (along with the relevant OpenSearch Description Document which describes all of the parameters available from this particular OpenSearch API)

      Collection level search endpoints are exposed at the repository (discipline?) level

      OpenSearch queries return an atom feed containing metadata about the things that match the query.  That metadata typically includes API endpoints for services that act on whatever was returned.  For example:

      At the Collection level the feed would return:

      • Metadata about the collection (title, description, spatial and temporal boundaries, citation, author, etc.) often bundled as the machine readable contents of a collection landing page
      • API endpoint for the entire metadata record for the collection
      • API endpoint for a granule level OpenSearch Description document (if the Collection supports finer level searches) or API endpoint to access the content of the collection otherwise (which might be OPeNDAP or OGC services or just plain ftp or whatever)

      At the item/granule level the feed would return:

      • Metadata about the item - whatever would show up on the items landing page
      • API endpoints for whatever services can be applied to that item (e.g., streaming video player) or to simply return that item