This is a proof of concept description of only one portion of the API Extension (API-X) Framework: the service discovery and binding component (SD&B). This document makes reference to other other components in API-X, not in order to define how those other components work, but simply as possible ways in which the SD&B component might interact with the larger API-X framework.
Authorization/Authentication of API-X and any registered service is beyond the scope of this document, though API-X should support these in some manner.
Background
Clients interacting with the API Extension framework will need a mechanism to discover the services that apply to a given repository resource. Likewise, services themselves will need a mechanism by which they can register or bind themselves to the API-X framework.
A principal role of the SD&B component is to support an architecture that recognizes that services come and go, sometimes unexpectedly. To that end, it should be possible to decouple the lifecycle of a particular service instance from the lifecycle (i.e. deployment) of the API-X framework, including the SD&B component. Furthermore, it should be possible to deploy this component in a distributed fashion across multiple machines, both to support high availability and high levels of concurrency. It should also be possible for services to be deployed on an arbitrary number of external hosts using any language or framework. With that structure, network partitions and service failure should not affect the overall operation of this component nor the overall operation of API-X.
In many ways, the SD&B component can be thought of as a management interface, distinct from individual service endpoints. While its role is not that of operating on specific repository resources, it can be viewed as a broker between clients, repository resources and external services.
The high level objectives of such a management interface are to support the following:
- Service Discovery (i.e. client interaction):
- list all available services
- list all services that apply to a given fedora object
- list all services that apply to a given rdf:type of fedora object
- list service status (availability/non-availability)
- provide some level of description of services (e.g. as RDF)
- use REST semantics
- Service Binding (i.e. service interaction)
- Services should be able to register and deregister themselves from API-X
- It should be possible for individual services to be available at N hosts (e.g. for high availability)
- If a particular service instance fails or is removed, API-X should know about that (optional)
- Deployment
- the SB&D component should be capable of being deployed in a fully distributed environment, across multiple hosts, and such deployment should be entirely transparent to clients.
- it should be possible for the SD&B interface to be deployed on separate hosts from the services themselves.
Reverse Proxy
A related concept to SD&B is that of a reverse proxy. The design details of that are out of scope for this document, but a possible outline is described in order to provide more context to the SD&B component. At a high level, a client using the API-X proxy could interact with a Fedora repository as if there were no proxy at all. The proxy may choose to add headers such as (e.g. for the resource /rest/resource
):
Link: <http://localhost:8080/rest/resource/svc:list>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:validate>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:ldcompact>; rel="service"
Link: <http://localhost:8080/rest/resource/svc:ldpath>; rel="service"
These headers would be generated using the SD&B interface. Then, when a client interacts with a service, e.g. at /rest/resource/svc:validate
the proxy mechanism will pass the request directly to an instance of that service, using the context of /rest/resource
. In this way, clients should have no need to interact directly with the SD&B component.
Still, there are cases where this higher level interface is not sufficient. For example, some services may require a raw TCP socket or some other non-HTTP-based interaction with a service. In order to also support those cases, it is recommended that the service discovery interface (or some portion thereof) be exposed directly to clients so that clients can connect and interact directly with external services (unmediated by the API-X proxy mechanism). These are described in the section on "Client Endpoints".
Endpoints
There are two categories of endpoints: those used by clients and those used by services. In these examples, all data exchange uses JSON-LD. These examples refer to a JSON-LD context such as the following:
{ "@context": { "id" : "@id", "type" : "@type", "apix" : "http://fedora.info/definitions/v4/apix/", "rdfs" : "http://www.w3.org/2000/01/rdf-schema#", "dcterms" : "http://purl.org/dc/terms/", "fedora" : "http://fedora.info/definitions/v4/repository#", "Binding" : {"@id" : "apix:Binding", "@type" : "@id"}, "Registry" : {"@id" : "apix:Registry", "@type" : "@id"}, "Service" : {"@id" : "apix:Service", "@type" : "@id"}, "ZooKeeperBinding" : {"@id" : "apix:ZooKeeperBinding", "@type" : "@id"}, "hasEndpoint" : {"@id" : "apix:hasEndpoint", "@type" : "@id"}, "hasParentZnode" : {"@id" : "apix:hasParentZnode"}, "hasService" : {"@id" : "apix:hasService"}, "hasZooKeeperEnsemble" : {"@id" : "apix:hasZooKeeperEnsemble", "@type": "@id"}, "supportsType" : {"@id" : "apix:supportsType", "@type": "@id"}, "seeAlso" : {"@id" : "rdfs:seeAlso", "@type" : "@id"}, "label" : {"@id" : "rdfs:label"}, "comment" : {"@id" : "rdfs:comment"}, "identifier" : {"@id" : "dcterms:identifier"} } }
This context file implies the existence of a defined API-X ontology, which is not defined here.
Client Endpoints
All client endpoints use HTTP REST semantics.
Discovering Services
Request:
GET /apix/registry
Request Parameters:
These optional parameters will filter the service list to include only those services that (a) can be applied to the provided Fedora resource, if defined and (b) can be applied to the provided rdf:type
URIs. In the case of multiple rdf:type
URIs, a boolean AND
operator is assumed.
id
- a particular Fedora resourcetype
- a comma-delimited list of rdf:type
URIs
Response:
Content-Type: application/json Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json" { "id" : "http://apix-host/apix/registry", "type" : "Registry", "hasService" : [ { "type" : "Service", "label" : "a foo webservice", "seeAlso" : "http://example.org/foo", "identifier" : "foo", "supportsType" : ["fedora:Resource"], "hasEndpoint" : ["http://host-1/foo/rest", "http://host-2/foo/rest"] }, { "type" : "Service", "label" : "a bar webservice", "seeAlso" : "http://example.org/bar", "identifier" : "bar", "supportsType" : ["fedora:Binary"], "hasEndpoint" : ["http://host-3/bar/rest", "http://host-4/bar/rest"] } ] }
In a similar way, information about a particular service can be retrieved:
GET /apix/registry/foo
Response:
Content-Type: application/json Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json" { "id" : "http://apix-host/apix/registry/foo", "type" : "Service", "label" : "a foo webservice", "seeAlso" : "http://example.org/foo", "identifier" : "foo", "supportsType" : ["fedora:Resource"], "hasEndpoint" : ["http://host-1/foo/rest", "http://host-2/foo/rest"] }
Service Endpoints
Registering Services
Services can be registered by interacting with the service registry. This endpoint only registers the existence of a service but does not make any guarantees about any running instances of that service. Such a service must also first be registered before any service instances can be bound to it.
PUT /apix/registry/foo Content-Type: application/ld+json { "@context" : "http://fedora.info/definitions/v4/apix.jsonld", "id" : "http://apix-host/apix/registry/foo", "type" : "Service", "label" : "a foo webservice", "seeAlso" : "http://example.org/foo", "identifier" : "foo", "supportsType" : ["fedora:Resource"] }
Note: the hasEndpoint
element is not included here, but is part of the /bind
interface, described below.
In a similar way, services can be de-registered. Any service instances bound to that service will be unbound, but this operation does not make guarantees about shutting down any instances of the service (which may be running on separate machines).
DELETE /apix/registry/foo
Service Binding
Before a client can interact with a particular service, that service must first be registered. In addition, one or more instances of that service must be bound to the API-X registry. Service binding can happen over HTTP or over another protocol defined by the implementation. These examples will use the ZooKeeper protocol for dynamic service binding, but other implementations could use a different binding protocol.
Manual binding
Some services may not be able to use dynamic service binding, e.g. a PHP web-application. For these, a manual binding interface is available. This example binds a particular service instance to the already-registered foo
service.
POST /apix/bind/foo Content-Type: text/plain http://host-1/foo/rest
The response will contain a unique id of this service binding. That URI can be used to unbind the service at a later point.
204 Created
http://apix-host/apix/bind/foo/some-id
Manually unbind a service instance:
DELETE /apix/bind/foo/some-id
Dynamic Binding
Depending on the implementation, it may be possible to dynamically bind/unbind services. For instance, with ZooKeeper, a service may communicate directly with a zookeeper ensemble. The dynamic binding protocol is described at this endpoint:
GET /apix/bind/foo
Response:
Content-Type: application/json Link: <http://fedora.info/definitions/v4/apix.jsonld>; rel="describedby"; type="application/ld+json" { "id" : "http://apix-host/apix/bind/foo", "type" : ["Binding", "ZooKeeperBinding"], "hasZooKeeperEnsemble" : ["host-1:2181", "host-2:2181", "host-3:2181"], "hasParentZnode" : "/service/foo" }
At this point (interacting directly with zookeeper), it would be the responsibility of the client to create an ephemeral, sequential znode under /service/foo
, storing the value of the service's endpoint. For example:
create("/service/foo/instance-", "http://host-1/foo/rest", null, EPHEMERAL_SEQUENTIAL)
Service Availability
In these examples, when clients request a list of available services, that list will contain hasEndpoint
values corresponding to the service instances that have been bound to API-X. For manually-bound services, those endpoints will continue to be included until they are manually un-bound. For dynamically-bound services, any interruption in the availability of the service (restarts, network partitions, host failure, etc) will cause the hasEndpoint
value to disappear.
Distributed Shared State
The API-X architecture should support a distributed deployment model. As such, in a distributed context, shared state of the service registry must be managed carefully. ZooKeeper is one obvious choice for this, as it avoids creating a single point of failure. If that is not a concern, a shared database would accomplish the same thing. There are two types of shared data that each node of the API-X discovery service will need to have access to:
- Basic configuration information about the cluster (list of nodes, etc)
- Descriptions of each service (see the
/apix/registry
endpoint above) - For each registered service, a list of each active service instance and the corresponding HTTP endpoint
Otherwise, no additional shared state should be maintained by the API-X SD&B component.
12 Comments
A. Soroka
A lot of this is nicely-thought-out, but it seems like running over the same ground over which SSWAP has already gone, and with less flexibility and functionality. Might it be better to do some wholesale adoption here?
Aaron Birkland
They look to be different in their goals to me, but perhaps I don't fully understand what SSWAP from browsing the literature. SSWAP looks like it provides a certain kind of service description (and is compared to WSDL and its ilk), data input/output description and discovery based on query of these descriptions, and is oriented to describing "semantic services"
The service discovery and binding component described here is mostly a registry of "what is where", and does not attempt to describe the nature of the services (does not describe what they do, their inputs/outputs, or how to interact with them).
So maybe SSWAP (and/or WSDL,, or whatever standard is relevant) has a role as a sort of black box service description that individual services may with to publish if they choose to do so? i.e. somebody building an infrastructure based SSWAP services would look for such descriptions, and know what do do with them?
A. Soroka
No, I understand SSWAP to be very much interested in "where the services actually are". Without that info, SSWAP could not be doing what it clearly does do, which is to enable people to actually run pipelines of discovered services.
Aaron Birkland
Digging deeper, the best description of SSWAP I could find is this publication (is there a something else we should be looking at?)
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-309
So with regard to service discovery in SSWAP:
Fair enough, also note:
This kind of pattern is probably worth discussing. It contrasts with Aaron's proposal (e.g. in the presence of explicit support for unbinding/de-registering services), and we should probably understand the security issues they are trying to avoid in SSWAP. However:
I think this starts to get us into territory outside of the scope of API-X, at least as as it is presently conceived through stakeholders' requirements. Reading further through the document reveals SSWAPP's notion of service invocation:
This is what leads me to believe that SSWAP primarily focused semantic on service description, and for wiring together services that act according to the described model(s). As such, it just feels like it's serving a different, specialized purpose than this SD&B proposal. Could you elaborate a little more on where the two are covering the same ground, and suggest some functionality that SSWAP provides that ought to be in the scope of an SSD&B component of API-X?
A. Soroka
You should be looking at the SSWAP website. The overlaps include:
Certainly API-X wants to feed bitstreams into services, too, but that seems like a natural extension, not a reason to start over. The deferencing thing is an impl detail. You could just not do that, if you liked.
Aaron Birkland
I'm not sure we're entirely at the point where it's clear that the role of API-X is to feed graphs into services, or reasoning about 'types' the services produce and consume, but let's do our due diligence and see where it goes. There's a general of notion of 'this service is in some way relevant/available to repository resources of this type', but I don't think we've made it entirely clear what that implies (as hinted at by Daniel Davis), or if it's sufficient. Maybe moving forward with some of the other Proof of Concept ideas will make our technical needs a little more concrete, then we can reconcile with the notion of SSWAP and see how it aligns?
A. Soroka
If API-X isn't planning to feed graphs into services, then you have decided that no API-X endpoint can be used over a Fedora RDF resource, which sounds like a bit of a strange choice...
I think that doing scratch implementations first is not really a great way to plan for alignment, but in the end, the folks contributing time will have to decide on the plan for investment, and I'm not contributing.
Daniel Davis
We are still conceptualizing. SD&B plus service infrastructure management has been around for a long time having many production implementations. The topic goes far beyond Fedora and API-X. Supporting smaller, more dynamic services and the cloud is breathing new life into the subject (distinguished by the term microservices). But this subject is more a variant than a fundamentally new thing.
I expect largely to use OSGI, SMX, Camel and JAX-RS services. Docker, Zookeeper or something like it. Ansible, Puppet or Chef. And all the usual networking suspects. Each of these already have admin interfaces. Then there is the question of a services registry which is getting new attention particulary how it integrates and overlaps other tooling found in a service-oriented infrastructure. The first question is whether a service infrastructure including Fedora needs something special in the area. Note, lots of REST-oriented service infrastructures work just using documentation on a Website combined with a good router and reverse-proxy.
We can go for a microservices registry without thinking about semantic tech and be inspired by the new work (especially Cloud-inspired) that is more REST-oriented than UDDI and ebXML (yes they are around and still heavily used). There is stuff like WADL, swagger et. al.
But Fedora's special sauce is semantic tech especially the LDP in combination with a repository.
I am not taking a position. And I am glossing over things big time. I am just looking as inspirations at this point starting with Aaron Coburn's proposal and other work that informs the design. Aaron Birkland has good points about SSWAP as possible overkill. I think a novel approach is appealing for API-X stuff because of the potential for applying semantic technologies to repository service architectures, I think its a next great step. But API-X has lots of places where I just want to grab stuff off the shelf to get going rather than define a new abstraction (in the end I really want interesting tooling). Even if SSWAP does not, right now, do everything we want right or does not do things exactly the way we want; it is worth looking at its approach to using semantic technology. We will also want to ask questions about how far can we go the other way to assemble a tooling framework as simply as possible with the least new parts, then look for new abstractions.
For now I am assuming that this work aiming for a minimal but innovative SD&B, that can grow into great ways of handling data and behavior together, and can - to some extent - support multiple microservice infrastructures through abstractions, and expose the implementations directly when needed. If aim is lower let us refine the requirements more.
If fun to get back into this!
A. Soroka
Looks like a better list of publications is here.
Daniel Davis
I am just a few hours into sswap but it seems that its overlap to the SD&B description is such that it needs to be understood to move forward. One notion from the past is that there is power in tying content and behavior together. sswap seems to addressing this. There is also a need to consider the mechanics of provisioning services in an infrastructure to make this work reliably. I am not certain at my level of understanding the degree which sswap already does this but, like the web architecture, it seems to just assume the service is there and working (and has no concern for those nasty little issues about making things reliable). It certainly seems to be able to convey the information needed to perform infrastructure management functions (using a model of the infrastructure).
As a builder of infrastructure, I would like both the divine and the pragmatic. It seems that an exercise following sswap for content, and sswap for service deployment would be useful. While SD&B may not be considered part of service deployment, I would like to use it for both purposes, or at least see how it fits together. This goes the the definition of a Client.
Its going to be hard to discuss this because the material is so dense. But that is also why I think this is a place where there is great possibility.
Elliot Metsger
My cursory reading of SSWAP is that it provides much more than 1) what is required for a proof-of-concept and 2) what may be required to satisfy the requirements of API-X.
That said, the goals of SSWAP are pretty compelling. From their paper:
In their discussion, they draw some conclusions including the pros and cons of their approach. Perhaps one course of action for the SD&B PoC would be to consider the SSWAP approach in reviewing and implementing this approach. Specifically, we can identify what our SD&B specification addresses and what it does not in the context of SSWAP (e.g. this SD&B proposal does not specify an IDL; it does provide capabilities for discovering a service based on the type of objects it operates on).
A. Soroka
That approach sounds very good to me. It is certainly true that SSWAP is massive overkill for a PoC. My remark above (in reply to Aaron Birkland) expressed my concern that going as far as PoC implementation takes engagement with SSWAP (whatever form that ends up taking) out of what I believe is the proper initial forum: SD&B scoping.