This is a proof of concept description of only one portion of the API Extension (API-X) Framework: the service discovery and binding component (SD&B). This document makes reference to other other components in API-X, not in order to define how those other components work, but simply as possible ways in which the SD&B component might interact with the larger API-X framework.

Authorization/Authentication of API-X and any registered service is beyond the scope of this document, though API-X should support these in some manner.

Background

Clients interacting with the API Extension framework will need a mechanism to discover the services that apply to a given repository resource. Likewise, services themselves will need a mechanism by which they can register or bind themselves to the API-X framework.

A principal role of the SD&B component is to support an architecture that recognizes that services come and go, sometimes unexpectedly. To that end, it should be possible to decouple the lifecycle of a particular service instance from the lifecycle (i.e. deployment) of the API-X framework, including the SD&B component. Furthermore, it should be possible to deploy this component in a distributed fashion across multiple machines, both to support high availability and high levels of concurrency. It should also be possible for services to be deployed on an arbitrary number of external hosts using any language or framework. With that structure, network partitions and service failure should not affect the overall operation of this component nor the overall operation of API-X.

In many ways, the SD&B component can be thought of as a management interface, distinct from individual service endpoints. While its role is not that of operating on specific repository resources, it can be viewed as a broker between clients, repository resources and external services.

The high level objectives of such a management interface are to support the following:

  • Service Discovery (i.e. client interaction):
    • list all available services
    • list all services that apply to a given fedora object
    • list all services that apply to a given rdf:type of fedora object
    • list service status (availability/non-availability)
    • provide some level of description of services (e.g. as RDF)
    • use REST semantics
  • Service Binding (i.e. service interaction)
    • Services should be able to register and deregister themselves from API-X
    • It should be possible for individual services to be available at N hosts (e.g. for high availability)
    • If a particular service instance fails or is removed, API-X should know about that (optional)
  • Deployment
    • the SB&D component should be capable of being deployed in a fully distributed environment, across multiple hosts, and such deployment should be entirely transparent to clients.
    • it should be possible for the SD&B interface to be deployed on separate hosts from the services themselves.

Reverse Proxy

A related concept to SD&B is that of a reverse proxy. The design details of that are out of scope for this document, but a possible outline is described in order to provide more context to the SD&B component. At a high level, a client using the API-X proxy could interact with a Fedora repository as if there were no proxy at all. The proxy may choose to add headers such as (e.g. for the resource /rest/resource):

Link: <http://localhost:8080/rest/resource/svc:list>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:validate>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:ldcompact>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:ldpath>; rel="service"

These headers would be generated using the SD&B interface. Then, when a client interacts with a service, e.g. at /rest/resource/svc:validate the proxy mechanism will pass the request directly to an instance of that service, using the context of /rest/resource. In this way, clients should have no need to interact directly with the SD&B component.

Still, there are cases where this higher level interface is not sufficient. For example, some services may require a raw TCP socket or some other non-HTTP-based interaction with a service. In order to also support those cases, it is recommended that the service discovery interface (or some portion thereof) be exposed directly to clients so that clients can connect and interact directly with external services (unmediated by the API-X proxy mechanism). These are described in the section on "Client Endpoints".

Endpoints

There are two categories of endpoints: those used by clients and those used by services. In these examples, all data exchange uses JSON-LD. These examples refer to a JSON-LD context such as the following:

apix.jsonld
{
    "@context": {
    "id" : "@id",
    "type" : "@type",

    "apix" : "http://fedora.info/definitions/v4/apix/",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#",
    "dcterms" : "http://purl.org/dc/terms/",
    "fedora" : "http://fedora.info/definitions/v4/repository#",

    "Binding" : {"@id" : "apix:Binding", "@type" : "@id"},
    "Registry" : {"@id" : "apix:Registry", "@type" : "@id"},
    "Service" : {"@id" : "apix:Service", "@type" : "@id"},
    "ZooKeeperBinding" : {"@id" : "apix:ZooKeeperBinding", "@type" : "@id"},

    "hasEndpoint" : {"@id" : "apix:hasEndpoint", "@type" : "@id"},
    "hasParentZnode" : {"@id" : "apix:hasParentZnode"},
    "hasService" : {"@id" : "apix:hasService"},
    "hasZooKeeperEnsemble" : {"@id" : "apix:hasZooKeeperEnsemble", "@type": "@id"},
    "supportsType" : {"@id" : "apix:supportsType", "@type": "@id"},
    "seeAlso" : {"@id" : "rdfs:seeAlso", "@type" : "@id"},
    "label" : {"@id" : "rdfs:label"},
    "comment" : {"@id" : "rdfs:comment"},
    "identifier" : {"@id" : "dcterms:identifier"}
  }
}

This context file implies the existence of a defined API-X ontology, which is not defined here.

Client Endpoints

All client endpoints use HTTP REST semantics.

Discovering Services

Request:

GET /apix/registry

Request Parameters:

These optional parameters will filter the service list to include only those services that (a) can be applied to the provided Fedora resource, if defined and (b) can be applied to the provided rdf:type URIs. In the case of multiple rdf:type URIs, a boolean AND operator is assumed.

id - a particular Fedora resource
type - a comma-delimited list of rdf:type URIs

Response:

Content-Type: application/json
Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json"
{
  "id" : "http://apix-host/apix/registry",
  "type" : "Registry",
  "hasService" : [
    {
      "type" : "Service",
      "label" : "a foo webservice",
      "seeAlso" : "http://example.org/foo",
      "identifier" : "foo",
      "supportsType" : ["fedora:Resource"],
      "hasEndpoint" : ["http://host-1/foo/rest", "http://host-2/foo/rest"]
    },
    {
      "type" : "Service",
      "label" : "a bar webservice",
      "seeAlso" : "http://example.org/bar",
      "identifier" : "bar",
      "supportsType" : ["fedora:Binary"],
      "hasEndpoint" : ["http://host-3/bar/rest", "http://host-4/bar/rest"]
    }
  ]
}

In a similar way, information about a particular service can be retrieved:

GET /apix/registry/foo

Response:

Content-Type: application/json
Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json"
{
  "id" : "http://apix-host/apix/registry/foo",
  "type" : "Service",
  "label" : "a foo webservice",
  "seeAlso" : "http://example.org/foo",
  "identifier" : "foo",
  "supportsType" : ["fedora:Resource"],
  "hasEndpoint" : ["http://host-1/foo/rest", "http://host-2/foo/rest"]
}

Service Endpoints

Registering Services

Services can be registered by interacting with the service registry. This endpoint only registers the existence of a service but does not make any guarantees about any running instances of that service. Such a service must also first be registered before any service instances can be bound to it.

PUT /apix/registry/foo

Content-Type: application/ld+json
{
  "@context" : "http://fedora.info/definitions/v4/apix.jsonld",
  "id" : "http://apix-host/apix/registry/foo",
  "type" : "Service",
  "label" : "a foo webservice",
  "seeAlso" : "http://example.org/foo",
  "identifier" : "foo",
  "supportsType" : ["fedora:Resource"]
}

Note: the hasEndpoint element is not included here, but is part of the /bind interface, described below.

In a similar way, services can be de-registered. Any service instances bound to that service will be unbound, but this operation does not make guarantees about shutting down any instances of the service (which may be running on separate machines).

DELETE /apix/registry/foo

Service Binding

Before a client can interact with a particular service, that service must first be registered. In addition, one or more instances of that service must be bound to the API-X registry. Service binding can happen over HTTP or over another protocol defined by the implementation. These examples will use the ZooKeeper protocol for dynamic service binding, but other implementations could use a different binding protocol.

Manual binding

Some services may not be able to use dynamic service binding, e.g. a PHP web-application. For these, a manual binding interface is available. This example binds a particular service instance to the already-registered foo service.

POST /apix/bind/foo

Content-Type: text/plain
http://host-1/foo/rest

The response will contain a unique id of this service binding. That URI can be used to unbind the service at a later point.

204 Created

http://apix-host/apix/bind/foo/some-id

Manually unbind a service instance:

DELETE /apix/bind/foo/some-id

Dynamic Binding

Depending on the implementation, it may be possible to dynamically bind/unbind services. For instance, with ZooKeeper, a service may communicate directly with a zookeeper ensemble. The dynamic binding protocol is described at this endpoint:

GET /apix/bind/foo

Response:

Content-Type: application/json
Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json"

{
  "id" : "http://apix-host/apix/bind/foo",
  "type" : ["Binding", "ZooKeeperBinding"],
  "hasZooKeeperEnsemble" : ["host-1:2181", "host-2:2181", "host-3:2181"],
  "hasParentZnode" : "/service/foo"
}

At this point (interacting directly with zookeeper), it would be the responsibility of the client to create an ephemeral, sequential znode under /service/foo, storing the value of the service's endpoint. For example:

create("/service/foo/instance-", "http://host-1/foo/rest", null, EPHEMERAL_SEQUENTIAL)

Service Availability

In these examples, when clients request a list of available services, that list will contain hasEndpoint values corresponding to the service instances that have been bound to API-X. For manually-bound services, those endpoints will continue to be included until they are manually un-bound. For dynamically-bound services, any interruption in the availability of the service (restarts, network partitions, host failure, etc) will cause the hasEndpoint value to disappear.

Distributed Shared State

The API-X architecture should support a distributed deployment model. As such, in a distributed context, shared state of the service registry must be managed carefully. ZooKeeper is one obvious choice for this, as it avoids creating a single point of failure. If that is not a concern, a shared database would accomplish the same thing. There are two types of shared data that each node of the API-X discovery service will need to have access to:

  • Basic configuration information about the cluster (list of nodes, etc)
  • Descriptions of each service (see the /apix/registry endpoint above)
  • For each registered service, a list of each active service instance and the corresponding HTTP endpoint

Otherwise, no additional shared state should be maintained by the API-X SD&B component.

  • No labels

12 Comments

  1. A lot of this is nicely-thought-out, but it seems like running over the same ground over which SSWAP has already gone, and with less flexibility and functionality. Might it be better to do some wholesale adoption here?

    1. They look to be different in their goals to me, but perhaps I don't fully understand what SSWAP from browsing the literature.  SSWAP looks like it provides a certain kind of service description (and is compared to WSDL and its ilk), data input/output description and discovery based on query of these descriptions, and is oriented to describing  "semantic services"

      The service discovery and binding component described here is mostly a registry of "what is where", and does not attempt to describe the nature of the services (does not describe what they do, their inputs/outputs, or how to interact with them).  

      So maybe SSWAP (and/or WSDL,, or whatever standard is relevant) has a role as a sort of black box service description that individual services may with to publish if they choose to do so?  i.e. somebody building an infrastructure based SSWAP services would look for such descriptions, and know what do do with them?

       

      1. No, I understand SSWAP to be very much interested in "where the services actually are". Without that info, SSWAP could not be doing what it clearly does do, which is to enable people to actually run pipelines of discovered services.

        1. Digging deeper, the best description of SSWAP I could find is this publication (is there a something else we should be looking at?)

          http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-309

          So with regard to service discovery in SSWAP:

          "A provider defines its resource by putting the SSWAP-compliant, OWL DL resource description graph (an RDG) on the web...in practice...we run a semantic discovery server... that accepts HTTP POSTS from resources to inform the discovery server of their presence.  

          Fair enough, also note:

          Because resource description graphs sit on the web like any other document, there is no active registration process with the discovery server, just like there is no active registration process with Google. This alleviates many of the security issues associated with de-registration or changing service definitions associated with active registration models, though it also means that the discovery server's knowledge of resources may be out of date with what is currently live on the web

          This kind of pattern is probably worth discussing.  It contrasts with Aaron's proposal (e.g. in the presence of explicit support for unbinding/de-registering services), and we should probably understand the security issues they are trying to avoid in SSWAP.  However:

          Upon retrieving the RDG, the discovery server dereferences terms up to three levels of indirection in an attempt to broaden its knowledge of concepts (ontology terms) used by the resource description graph (RDG)

          I think this starts to get us into territory outside of the scope of API-X, at least as as it is presently conceived through stakeholders' requirements.   Reading further through the document reveals SSWAPP's notion of service invocation:

          By SSWAP convention, POSTing a graph to a resource is interpreted as a request to invoke the service. Once a client has a service's URI (for example, from a discovery server query response graph or its own listings), the client can POST the service's RDG back to the service with input data typed as the sswap:Subject (Figure 6). The client always knows the service's interface because the RDG is a logical description of the service's transformation available to anyone with a simple HTTP GET on the same URI used for invocation

          This is what leads me to believe that SSWAP primarily focused semantic on service description, and for wiring together services that act according to the described model(s).   As such, it just feels like it's serving a different, specialized purpose than this SD&B proposal. Could you elaborate a little more on where the two are covering the same ground, and suggest some functionality that SSWAP provides that ought to be in the scope of an SSD&B component of API-X?  

           

           

           

          1. You should be looking at the SSWAP website. The overlaps include:

            • Service description
            • Service discovery (matching types with services that produce and consume them)
            • Feeding graphs into services

            Certainly API-X wants to feed bitstreams into services, too, but that seems like a natural extension, not a reason to start over. The deferencing thing is an impl detail. You could just not do that, if you liked.

            1. I'm not sure we're entirely at the point where it's clear that the role of API-X is to feed graphs into services, or reasoning about 'types' the services produce and consume, but let's do our due diligence and see where it goes.  There's a general of notion of 'this service is in some way relevant/available to repository resources of this type', but I don't think we've made it entirely clear what that implies (as hinted at by Daniel Davis), or if it's sufficient.  Maybe moving forward with some of the other Proof of Concept ideas will make our technical needs a little more concrete, then we can reconcile with the notion of SSWAP and see how it aligns?

              1. If API-X isn't planning to feed graphs into services, then you have decided that no API-X endpoint can be used over a Fedora RDF resource, which sounds like a bit of a strange choice...

                I think that doing scratch implementations first is not really a great way to plan for alignment, but in the end, the folks contributing time will have to decide on the plan for investment, and I'm not contributing. (smile)

          2. We are still conceptualizing. SD&B plus service infrastructure management has been around for a long time having many production implementations.  The topic goes far beyond Fedora and API-X.  Supporting smaller, more dynamic services and the cloud is breathing new life into the subject (distinguished by the term microservices).  But this subject is more a variant than a fundamentally new thing.

            I expect largely to use OSGI, SMX, Camel and JAX-RS services.  Docker, Zookeeper or something like it.  Ansible, Puppet or Chef.  And all the usual networking suspects. Each of these already have admin interfaces.  Then there is the question of a services registry which is getting new attention particulary how it integrates and overlaps other tooling found in a service-oriented infrastructure.  The first question is whether a service infrastructure including Fedora needs something special in the area.  Note, lots of REST-oriented service infrastructures work just using documentation on a Website combined with a good router and reverse-proxy.

            We can go for a microservices registry without thinking about semantic tech and be inspired by the new work (especially Cloud-inspired) that is more REST-oriented than UDDI and ebXML (yes they are around and still heavily used). There is stuff like WADL, swagger et. al.

            But Fedora's special sauce is semantic tech especially the LDP in combination with a repository.

            • We can choose an existing framework/approach, use its abstractions and be criticized about getting locked to it.
            • We can try to define an abstraction that is only about working with raw services but tries to be neutral about implementation.  It can express information about services using RDF and/or OWL modeling the services by themselves.  We could add service management information in this too but then we need define the mapping to common infrastructure tooling.
            • We can try to define an abstraction that encompasses services, the above but also ties data and behavior, and does some of the service connections (pipelines at least).

            I am not taking a position.  And I am glossing over things big time.  I am just looking as inspirations at this point starting with Aaron Coburn's proposal and other work that informs the design.  Aaron Birkland has good points about SSWAP as possible overkill.  I think a novel approach is appealing for API-X stuff because of the potential for applying semantic technologies to repository service architectures, I think its a next great step.  But API-X has lots of places where I just want to grab stuff off the shelf to get going rather than define a new abstraction (in the end I really want interesting tooling).  Even if SSWAP does not, right now, do everything we want right or does not do things exactly the way we want;  it is worth looking at its approach to using semantic technology.  We will also want to ask questions about how far can we go the other way to assemble a tooling framework as simply as possible with the least new parts, then look for new abstractions.

            For now I am assuming that this work aiming for a minimal but innovative SD&B, that can grow into great ways of handling data and behavior together, and can - to some extent - support multiple microservice infrastructures through abstractions, and expose the implementations directly when needed.  If aim is lower let us refine the requirements more.

            If fun to get back into this!

          3. Looks like a better list of publications is here.

  2. I am just a few hours into sswap but it seems that its overlap to the SD&B description is such that it needs to be understood to move forward.  One notion from the past is that there is power in tying content and behavior together.  sswap seems to addressing this.  There is also a need to consider the mechanics of provisioning services in an infrastructure to make this work reliably.  I am not certain at my level of understanding the degree which sswap already does this but, like the web architecture, it seems to just assume the service is there and working (and has no concern for those nasty little issues about making things reliable).  It certainly seems to be able to convey the information needed to perform infrastructure management functions (using a model of the infrastructure).

    As a builder of infrastructure, I would like both the divine and the pragmatic. It seems that an exercise following sswap for content, and sswap for service deployment would be useful.  While SD&B may not be considered part of service deployment, I would like to use it for both purposes, or at least see how it fits together.  This goes the the definition of a Client.

    Its going to be hard to discuss this because the material is so dense.  But that is also why I think this is a place where there is great possibility.

  3. My cursory reading of SSWAP is that it provides much more than 1) what is required for a proof-of-concept and 2) what may be required to satisfy the requirements of API-X. 

    That said, the goals of SSWAP are pretty compelling.  From their paper:

    • deploy a common syntax 

      "that is, allow clients and providers to engage each other under shared syntactical rules. Currently, the GET query strings to major web- based biological information resources such as Entrez http://www.ncbi.nlm.nih.gov/Entrez, Gramene http:/ /www.gramene.org, and LIS http://www.comparative- legumes.org all have differing and idiosyncratic syntaxes, thereby making interoperability consist of one- off scripts that are inherently non-scalable;"

    • develop a shared semantic
      "that is, allow machine-discernable meaning so clients can request the same conceptual object or service from different providers. For example, many providers offer equivalent DNA sequences or sequence comparison algorithms, yet scripts cannot compare and contrast the offerings without low-throughput, case-by-case customization. An infrastructure for semantic negotiation is needed; especially one that is cognizant of the sociological influences of achieving a shared semantic;"

    • implement a discovery server
      "that is, allow clients to find providers based on the semantics of their data and services. Specifically we introduce the capability of semantic searching as defined below. As built upon a common syntax and semantic amenable to a formal logic, this is the necessary condition for scalable integration."

    In their discussion, they draw some conclusions including the pros and cons of their approach.  Perhaps one course of action for the SD&B PoC would be to consider the SSWAP approach in reviewing and implementing this approach.  Specifically, we can identify what our SD&B specification addresses and what it does not in the context of SSWAP (e.g. this SD&B proposal does not specify an IDL; it does provide capabilities for discovering a service based on the type of objects it operates on).

    1. That approach sounds very good to me. It is certainly true that SSWAP is massive overkill for a PoC. My remark above (in reply to Aaron Birkland) expressed my concern that going as far as PoC implementation takes engagement with SSWAP (whatever form that ends up taking) out of what I believe is the proper initial forum: SD&B scoping.