Table of Contents

Problem

In order to consider Fedora's REST API complete (FCREPO-543), we need to have functional equivalents to API-M's relationship management methods.

This includes the ability to:

  • add and remove relationships incrementally
  • get a subset of relationships

Approaches

We started by talking about a couple general approaches during the November 3rd Committer Meeting (notes and audio here).

  1. Generic endpoint for updating / querying all relationships in the repository.  This would allow one to update relationships without specifying which object or datastream they resided in explicitly.  There was some concern that this might result in confusion because it sets up the expectation that arbitrary RDF is accepted, but in reality, it would only accept changes that were valid in RELS-EXT and RELS-INT datastreams (e.g., those whose subject is a Fedora object and whose predicate is not a reserved predicate).
  2. Object-specific endpoints for updating relationships on a per-datastream basis.  This approach would require that applications using the API are aware of which objects and datastreams the relationships are asserted in.  While discussing this approach, we noted that this pattern could be followed in a general way to make partial datastream updates possible for other types of datastreams as well.

Strawman Proposal

During the call, we talked most about approach #2, and started to define what it would entail:

  1. Define a generic method (or set of methods) for applying partial changes to datastreams.
  2. Implement it for RELS-EXT and RELS-INT (RDF-based) so that additions and deletions to those datastreams can be made via SPARQL update.

Example:

  • HTTP Verb: POST
  • URL: /objects/$pid/fedora-system:SomeBuiltInSDef/updateDatastream?dsID=$dsID&type=sparql-update
  • Body: A SPARQL/Update document (UTF-8?) to apply to the datastream
  • Successful response:
    • Code: 200
    • Location header: URL to datastream version
    • Content-type header: text/plain; charset=utf-8
    • Body: URL to datastream version
  • Failed response (bad request):
    • Code: 400
    • Content-type header: text/plain; charset=utf-8
    • Body: (one-line message)
      • Unrecognized update type: $type
      • No such object: $pid
      • No such datastream in $pid: $dsID

Notes:

  • Adding or modifying the entire content of a datastream would still be possible via a POST or PUT, respectively, to the datastream URL as it is today.
  • The syntax for invoking disseminations has not yet been decided for the REST API; the URL above with fedora-system:SomeBuiltInSDef assumes an obvious possible syntax.

Outstanding Issues:

  1. Does SPARQL Update have a mime type?  If so that might make sense to use as the value of the type parameter
  2. The strawman does not yet include a way to get relationships.  Perhaps this could also be a built-in function of SomeBuiltInSDef.
    1. Q: Would it require one to specify which datastream (e.g. RELS-EXT) the relationships reside in?
    2. Q: What form of query would it accept?  A simple S,P,O style query, or SPARQL?

Thoughts?

Please feel free to comment on this proposal here or on the dev list.

#trackbackRdf ($trackbackUtils.getContentIdentifier($page) $page.title $trackbackUtils.getPingUrl($page))
  • No labels

8 Comments

  1. My comments regarding subgraphs, etc. from the call were intended to be something along the lines of:

    http://fedora-commons.org/confluence/display/~barmintor/REST+API+for+Relationships+as+Subgraphs

    This doesn't address the writeable-sdef's issue, nor the triplestore-is-only-a-cache-and-might-not-even-exist issue, but I think it's interesting to try and think of how some of the core datastreams work if they're all RDF serializations.

  2. RE: Strawman:

    I'm sure we hashed this out, but couldn't we use an extension of the existing url format, with the addition of:

    1. the POST verb to accommodate updates
    2. a type (updatetype?) parm (as indicated above) to allow for Sparql update documents for partial updates

    I know that smacks of default disseminator (and would require a special writeable default disseminator for RDF datastreams), but a strength of the existing format is that /objects/{pid}/datastreams/ {dsID} is that the URL is a resource, and the HTTP verb is the operation.

    GET /objects/{pid}/datastreams/{dsID} : returns 200 and the datastream on success
    PUT /objects/{pid}/datastreams/{dsID} : return 201 and datastream location on success
    POST /objects/{pid}/datastreams/ {dsID} : return 204 (or 205? Kind of depends on whether POST-redirect is followed) and datastream location

    DELETE /objects/{pid}/datastreams/{dsID} : return 204 on success

    1. Yeah, as I was writing this up it just felt weird to have the partial mods going to an entirely different kind of URL than full adds/mods.  It seems like if we go in the direction of the default disseminator for core API, it should be done uniformly.  Otherwise, I think users of the API are going to be left shaking their heads.

      Defining it via the existing URL format is easier:

      • ..to implement
      • ..to remember the syntax
      • ..to explain in docs without jumping into the topic of disseminators

      Note on status codes: I started by putting 201 as the success code, thinking that was the logical choice, then noticed that the existing REST API uses 200, so changed it back to that for consistency.  Even if 204 or 205 were used, it seems like a Location: header would be good in any case.

  3. In considering SPARQL UPDATE, we should take note that at the moment it's a W3C Member Submission (not a recommendation) - http://www.w3.org/Submission/SPARQL-Update/ - and should probably also consider the 1.1 proposal - http://www.w3.org/TR/sparql11-update/ (W3C Working Draft).

    Suggest we implement only a core subset at this stage, hopefully identify stuff that's unlikely to change.

    1. We probably wouldn't need much, since so much is inferred from the endpoint and HTTP verb.  If the request is a PUT or DELETE to an endpoint at /objects/{pid}/DC, the operation and the subject of all the triples is assumed already (likewise RELS-EXT; RELS-INT requires subjects to be specified but they're still constrained by {pid}).  POST requires an operation to be specified, since a modification might be a partial add or a partial delete.

      We could skirt the issue entirely by requiring POST requests to have parameters/data for operation (inset or delete), graph language (Sparql, rdf, n3, etc.) and the triple content, but that would require 2 versionable operations to change the objects of a set of triples.  Given the existing constraints of the REST api, is the thing that Sparql Update really adds the ability to POST a partial delete succinctly?

      1. I think the interesting thing that Sparql Update gives is an opportunity to post a set of deletes and adds in an atomic operation.

        OTOH, just posting an RDF chunk with an add/delete indicator (either in the endpoint name or as a parameter) is dead simple to implement + understand and based on standard formats with registered media types.

    2. +1 on a subset.  Hopefully we can borrow a parser someone else has written if we go with sparql update.  Thanks for the 1.1 WD link; I knew there was a more recent version than the one I was looking at.  I don't see a media-type defined but asked PaulG about it.

  4. API-M currently only contains methods for manipulating relationships held in RELS-EXT - might want to consider extending that to RELS-INT as well to support the proposed REST API.