Title (Goal) | Amherst - JSON-LD compaction service |
---|---|
Primary Actor | Developer |
Scope | Component |
Level | |
Author | Unknown User (acoburn) |
Story (A paragraph or two describing what happens) | In order to improve front-end (read) performance, it would be useful to store fedora resources as JSON in a key-value store (riak, mongodb, couchdb, etc, etc). That way, the objects can be more efficiently delivered to a web-based framework without needing to access fedora at all. Fedora will already generate JSON-LD in expanded form, but for application-specific use (applications that don't necessarily understand RDF), a compact form would be preferred. This would simply involve applying a context file to the expanded JSON-LD form. |
A sample implementation is available here: https://github.com/acoburn/repository-extension-services/, in particular the acrepo-jsonld-service
and acrepo-jsonld-cache
modules.
Web Resource interaction
This service would expose an HTTP endpoint to convert fedora resources into a compact JSON-LD representation.
Deployment or Implementation notes
This service would be deployed separately from fedora, possibly on a separate machine. I envision that this would be implemented as a combination of OSGi services and camel routes that can be deployed in any OSGi container, written in Java and Blueprint XML. The implementation would require access to Fedora's HTTP API.
API-X Value Proposition
The primary use of this service in the context of API-X would be to allow for service discovery.
5 Comments
Aaron Birkland
Ah, so does this encompass two potential use cases?:
If (2), then would this be in addition to (and behind) a caching proxy such as squid?
Unknown User (acoburn)
Yes, this does encompass both cases. See implementations here: https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-jsonld-cache and https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-jsonld-service
Related to your question about (2), one could use a caching proxy as part of this, but I see this as unnecessary. In our current (fedora3) repository, we use Riak as a type of cache. Riak (like other such systems) have several advantages over a simple caching proxy (squid, varnish, etc), including: being able to shard and replicate the data over an arbitrary number of back ends (providing both higher throughput and better fault tolerance), and also allowing for Map-Reduce operations over arbitrary sets of data in the cluster, which a simple proxy cache cannot do.
In my experience, the read performance of riak is so good that an additional proxy is really unnecessary.
Aaron Birkland
Fascinating that performance is so good! Your implementation (from what I understand, just quickly looking through the code) could be deployed on an arbitrary karaf instance in someone's backend infrastructure, with the caching service available via requests to
http://${some.host}:${some.port}/jsonld
. Maybe you have several of these services running on different hosts.How would you envision API-X making cached representations of objects available to the public? Would it be through filtering incoming GET requests to the repository and polling the caching service (as speculated above in my initial comment) so that it happens transparently? Through providing additional representations of the object, at their own URIs, backed by the cache? Both?
Unknown User (acoburn)
Performance is excellent, and if you need it to handle higher throughput, you just add more backend nodes. Typically, with Riak, you have an arbitrary number of nodes (it's masterless and can scale up or down easily), and you set up one or more reverse proxies (e.g. haproxy) pointing at that cluster, so your service points to that single location (I've never needed more than one instance of haproxy running). So yes, you can have one or more instances of karaf running, each pointing to its own local instance of haproxy (which points to the riak cluster). To start, I don't imagine needing more than a single instance of karaf for this, but this architecture is embarrassingly easy to scale, even with a single instance of fedora.
For API-X, I'd have incoming requests pull the data directly from riak. If that fails (404 or otherwise), the request will attempt to extract the resource directly from fedora. But yes, I believe your earlier speculation about how it works is correct. (I also store thumbnails and other small binary objects there, too, since throughput performance is so much better than fedora3 – that does change with fedora4, but I will probably still cache small binaries like this).
A. Soroka
This seems like a special case of a more general idea: "Use sophisticated caching (equipped with minimizing abilities) in front of Fedora." I'm not sure in what way it "extends the Fedora API"? There are no new functions here...