FIXME - Iowa work to support performant lookups when coupled with Samvera (aka Hydra) Community Linked Data Support
Overview
The University of Iowa primarily working on the design and implementation of an support infrastructure layer for an eventual ecosystem of Linked Open Data servers and systems. Given the somewhat immature nature of currently deployed LOD resources (i.e., offline SPARQL endpoints), the project decided that it was advisable to deploy our own services for the various LOD resources. This allowed us to
- make reasonable assumptions regarding resource availability,
- control performance characteristics, since we controlled the hardware, and
- control the nature of the data returned to queries.
The work done regarding this last point included, tuning the rank order of results, the specificity of what comprised a match to a user query, and what data were returned. In particular, we were able to inject an additional triple indicating a particular entity's rank in the results - something not present in the underlying triplestore.
Authorities
Our deployment process became regularized to the extent that a number of authority sources were included:
- Agrovoc (agricultural concepts)
- DBpedia (general knowledge)
- FAST (general subject headers from OCLC, derived from LoC subject headers)
- GeoNames (places in the real world)
- Getty (content relating to artistic works)
- AAT (concepts)
- ULAN (persons and organizations)
- TGN (places)
- Library of Congress
- Genre
- Name
- Subject
- MeSH (NLM medical subject headings)
- VIAF (authority cross-walks)
All of these services are available at http://services.ld4l.org/ld4l_services/index.jsp.
Technology Stack
The overall architecture was implemented entirely with open source tools:
- Apache HTTPD - the standard v. 2.4 web server deployed with macOS
- ld4l_services - this is a Java Server Pages (JSP) application (available at https://github.com/eichmann/ld4l_services) heavily reliant on two JSP tag libraries:
- LuceneTagLib - a wrapper for executing Lucene full text searches and accessing the results of those searches (available at https://github.com/eichmann/LuceneTagLib)
- SPARQLTagLob - a wrapper supporting SPARQL queries from a JSP page in the same manner used to access relational databases using the SQL standard tag library (available at https://github.com/eichmann/SPARQLTagLib)
- Apache Tomcat application container - we specifically are using version 9.0.0.M9, although pretty much any version of Tomcat would work, as we're not using an particular features of this version.
- Apache Jena Fuseki - the SPARQL endpoint, version 2.4.0
- Java SE Runtime Environment - version 1.8.0
Server Configuration
- Mac Pro (late 2013), 3 GHz, 8 cores, 64 GB memory, macOS High Sierra (v. 10.13.6)
- Promise Pegasus2 disk array, 8x4tb RAID5, Thunderbolt2 connection to the Mac Pro
Two equivalent configurations were deployed, each with full copies of the LOD on the disk array. An Apache virtual host configuration was used to both manage the services.ld4l.org domain configuration and to configure the two machines using Apache's balancer feature to identify the first machine as the primary service provider with the second machine as a "hot spare." Each instance of the ld4l_services application access the data using the virtual host name, providing redundancy both in the application and in query processing. Adding additional BalanceMembers is trivial and provides a significant ability to scale overall capacity.
<VirtualHost *:80>
ServerName services.ld4l.org
ServerAdmin david-eichmann@uiowa.edu
DocumentRoot "/Library/WebServer/LD4L-Documents"
<Proxy "balancer://fuseki">
BalancerMember "http://localhost:3030"
BalancerMember "http://deep-thought.slis.uiowa.edu:3030" status=+H
ProxySet lbmethod=byrequests
</Proxy>
<Proxy "balancer://tomcat">
BalancerMember "http://localhost:8080"
BalancerMember "http://deep-thought.slis.uiowa.edu:8080" status=+H
ProxySet lbmethod=byrequests
</Proxy>
RewriteEngine On
RewriteRule ^/fuseki$ fuseki/ [R]
ProxyPass "/fuseki" "balancer://fuseki" stickysession=JSESSIONID
ProxyPassReverse "/fuseki" "balancer://fuseki"
ProxyPassMatch "^/.*" "balancer://tomcat" stickysession=JSESSIONID
ProxyPassReverse "/" "balancer://tomcat"
<Directory "/Library/WebServer/LD4L-Documents">
Options FollowSymLinks Multiviews
MultiviewsMatch Any
AllowOverride None
Require all granted
</Directory>
ErrorLog "/private/var/log/apache2/ld4l-error_log"
CustomLog "/private/var/log/apache2/ld4l-access_log" combined
</VirtualHost>