Overview

The University of Iowa primarily worked on the design and implementation of a support infrastructure layer for an eventual ecosystem of Linked Open Data servers and systems. Given the somewhat immature nature of currently deployed LOD resources (e.g., offline SPARQL endpoints), the project decided that it was advisable to deploy our own services for the various LOD resources. This allowed us to

The work done regarding this last point includes tuning the rank order of results, the specificity of what comprised a match to a user query, and what data were returned.  In particular, we were able to inject an additional triple indicating a particular entity's rank in the results - something not present in the underlying triplestore.

For examples of integration of these services into other elements of the project, please see Architecture for Authority Lookup.

Authorities

Our deployment process became regularized to the extent that a number of authority sources were included:

All of these services, including versions supporting human interaction with the results, are available at http://services.ld4l.org/ld4l_services/index.jsp. Direct human exploration of the various triplestores using SPARQL is available at http://services.ld4l.org/fuseki/.

Request Parameterization

To simplify both the creation of new services and the understanding by developers of applications consuming these services, we standardized the parameters accepted by the various services as much as possible:

Hence the following query - http://services.ld4l.org/ld4l_services/getty_batch.jsp?query=Picasso&maxRecords=10&entity=Person - will return the triples relevant to 10 entities of class Person (i.e., from the Getty ULAN authority) where the word Picasso appears. Note that the actual number of triples return can vary widely due to differences in coverage between entities, even within a single authority source.

Technology Stack

The overall architecture was implemented entirely with open source tools:

Processing Flow

Server Configuration

Two equivalent configurations were deployed, each with full copies of the LOD on the disk array. An Apache virtual host configuration was used to both manage the services.ld4l.org domain configuration and to configure the two machines using Apache's balancer feature to identify the first machine as the primary service provider with the second machine as a "hot spare." Each instance of the ld4l_services application access the data using the virtual host name, providing redundancy both in the application and in query processing. Adding additional BalanceMembers is trivial and provides a significant ability to scale overall capacity.

<VirtualHost *:80>
    ServerName services.ld4l.org
    ServerAdmin david-eichmann@uiowa.edu
    DocumentRoot "/Library/WebServer/LD4L-Documents"
    <Proxy "balancer://fuseki">
        BalancerMember "http://localhost:3030"
        BalancerMember "http://deep-thought.slis.uiowa.edu:3030" status=+H
        ProxySet lbmethod=byrequests
    </Proxy>
    <Proxy "balancer://tomcat">
        BalancerMember "http://localhost:8080"
        BalancerMember "http://deep-thought.slis.uiowa.edu:8080" status=+H
        ProxySet lbmethod=byrequests
    </Proxy>
    RewriteEngine On
    RewriteRule ^/fuseki$ fuseki/ [R]
    ProxyPass "/fuseki" "balancer://fuseki" stickysession=JSESSIONID
    ProxyPassReverse "/fuseki" "balancer://fuseki"
    ProxyPassMatch "^/.*" "balancer://tomcat" stickysession=JSESSIONID
    ProxyPassReverse "/" "balancer://tomcat"
    <Directory "/Library/WebServer/LD4L-Documents">
        Options FollowSymLinks Multiviews
        MultiviewsMatch Any
        AllowOverride None
        Require all granted
    </Directory>
    ErrorLog "/private/var/log/apache2/ld4l-error_log"
    CustomLog "/private/var/log/apache2/ld4l-access_log" combined
</VirtualHost>

A Complete List of GitHub Repositories Related to the Project