Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following proposed features are not in priority or temporal order.

Key to symbols

(tick) or (thumbs up): the work for this is basically done, even if an issue has not yet been created to track final completion

(thumbs up) we have this work underway – check the related JIRA issue to track progress

(question) we think we understand this but are not sure it will make it into the release – if there's a JIRA issue linked, please go vote on it and comment to indicate why it should be either bumped up or deferred

...

There are a number of possible routes to performance improvement for VIVO, and we seek input from the community on what the primary pain points are. Some performance issues are related to installation and configuration of VIVO and we are working on improving documentation, notably on MySQL configuration, tuning, and troubleshooting, but page caching has emerged as the primary performance-related improvement for 1.6.

(tick)
Anchor
Caching
Caching
Page caching

...

There are also other ways to address performance that could be argued are more effective in the long run

  • (thumbs up)(tick) As mentioned above, improved server, Apache, Tomcat, and database configuration and tuning
  • (warning) (not part of the 1.6 release – more requirements needed) If we can identify key areas where some form of intermediate results are being repeatedly requested from the database, implementing Memcached could be another strategy. However, it may be more effective to provide MySQL more memory since it can use its own strategies for query caching
  • (tick) Tim Worrall has been looking at our page templates for instances where we could avoid issuing SPARQL queries for the same data repeatedly in the course of generating a single page, and has also been optimizing SPARQL queries that come to his attention
  • (warning) (not part of the 1.6 release – independent investigation) There is also some indication that bugs in Jena's SDB implementation that make queries other than to a single graph or the union of all graphs much less efficient, at least for MySQL.  This is hard to verify, and we have mostly been approaching this by exploring the use of other triple stores via the RDF API added with the VIVO 1.5x releases.

...

  • Brian Caruso has proposed adding a unit test for Solr that would build an index from a standard set of VIVO RDF, start Solr, and run standard searches. This would help prevent breaking existing functionality when addressing issues that have come up such as support for diacritics, stop words, and capital letters in the middle of names
    • A (question) A unit test has been developed for another related project at Cornell and we hope to be able to port this to VIVO, but perhaps not for 1.6
    • Jira
      serverDuraSpace JIRA
      keyVIVO-102
  • (warning) (not for 1.6) Developing repeatable tests of loading one or more large datasets into VIVO. The challenge here is that performance is highly installation dependent.  The most urgent problem at Cornell has been the intermittent loss of communication between the VIVO web server and the VIVO database server, which results in some threads of activity simply hanging and never returning.  As with many errors that are hard to reproduce, we have developed workarounds that divide large jobs into chunks of data that experience has shown can be removed or added without causing hiccups.
    • Joe M. has submitted a paper to the Conference on a data ingest method using a standard set of data.  This could conceivably be extended to serve as a set of tests, but is presently more geared toward helping people new to VIVO understand data ingest than testing VIVO under load

Anchor
ISF
ISF
Adapting VIVO to the Integrated Semantic Framework ontology

(thumbs up) 

Jira
serverDuraSpace JIRA
keyVIVO-112

...