Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are a number of possible routes to performance improvement for VIVO, and we seek input from the community on what the primary pain points are. Some documentation has improvedperformance issues are related to installation and configuration of VIVO and we are working on improving documentation, notably on MySQL configuration, tuning, and troubleshooting, but page caching has emerged as the primary performance-related improvement for 1.6.

Page caching

If As more data is added to VIVO, some profiles get very large, most commonly when a person accumulates hundreds of publications that much each be included in rendering the person's profile page. While we continue to look at ways to improve query speeds, if you need your VIVO to display all pages with more or less equivalent, sub-second rendering times, some form of page caching at the Apache level using a tool such as Squid will likely be in your future is necessary. Apache is very fast at displaying static HTML pages, and Squid saves a copy of every page rendered in a cache from which the page can be rendered by Apache rather than generated once again by VIVO. The good news:

...

  • We need to verify that the field holding the date and time of last update in Solr does indeed reflect all the changes that we think should be reflected. For example, if a 4th co-author on one of 150 publications in VIVO changes his or her middle name, should this trigger a re-index of every other co-author's entire VIVO page content? How far does the impact of an editing change get transmitted, and is that too far (bogging VIVO down) or not far enough (failing to reflect changes that do matter)
  • Pages in VIVO may have many different incarnations depending on whether a user is logged in or not, and if logged in, depending on their level of editing privileges. Is there an easy (read non computationally expensive) way to prevent use of the cache when The caching solution being implemented for 1.6 will disable use of the cached pages whenever a user is logged in and has editing privileges for the VIVO individual being requested?to edit, even on pages where they have no editing rights.

Griffith University has implemented page caching and Arve Solland gave a talk on this and other aspects of the Griffith Research Hub at the 2012 VIVO conference.

...

There are also other ways to address performance that could be argued are more effective in the long run

  • As mentioned above, improved server, Apache, Tomcat, and database configuration and tuning
  • If we can identify key areas where some form of intermediate results are being repeatedly requested from the database, implementing Memcached could be another strategy. However, it may be more effective to provide MySQL more memory since it can use its own strategies for query caching
  • Tim Worrall has been looking at our page templates for instances where we could to cache the results of commonly used SPARQL queries avoid issuing SPARQL queries for the same data repeatedly in the course of generating a single pageoptimizing , and has also been optimizing SPARQL queries that come to his attention
  • There is also some indication that working around bugs in Jena's SDB implementation that make queries other than to a single graph or the union of all graphs much less efficient, at least for MySQL.  This is hard to verify, and we have mostly been approaching this by exploring the use of other triple stores via the RDF API added with the VIVO 1.5x releases.

Installation and Testing

  • Adding Brian Caruso has proposed adding a unit test for Solr that would build an index from a standard set of VIVO RDF, start Solr, and run standard searches. This would help prevent re-introducing problems breaking existing functionality when addressing issues that have come up such as lack of support for diacritics, stop words, and capital letters in the middle of names
    • A unit test has been developed for another related project at Cornell and we hope to be able to port this to VIVO, but perhaps not for 1.6
  • Developing repeatable tests of loading one or more large datasets into VIVO. The challenge here is that performance is highly installation dependent.  The most urgent problem at Cornell has been the intermittent loss of communication between the VIVO web server and the VIVO database server, which results in some threads of activity simply hanging and never returning.  As with many errors that are hard to reproduce, we have developed workarounds that divide large jobs into chunks of data that experience has shown can be removed or added without causing hiccups.  Stay tuned.

Site and Page Management

  • Make the About page and Home page HTML content editable through admin interfaceinterface – this relates to display model changes
  • (Largely complete) Offering improved options for content on the home page, including a set of SPARQL queries to highlight research areas, international focus, or most recent publications
  • (Complete) Offering additional individual page template options
  • (Complete) Offering the ability to embed SPARQL query results in individual pages on a per-class basis – for example, to show all research areas represented in an academic department
  • (Complete) Cornell is working on new individual page templates that include screen-captured versions of related websites for people and organizations, so that in addition to the link to the website we show either a small or large version of a thumbnail of the page. This is done through a commercial image capture service that other sites may not want to use, and will have to become configurable. Another service might not provide the same API or resultant image size, however. In any case, the new individual page templates will have to be optional, since sites may have done a lot of customization work.
    • Could put the service-specific aspects in a sub-template that gets imported and could be default not attempt to capture and cache images at all
    • are free services out there, but they may not be there in 6 months

...

When 2 URIs are declared to be the same in VIVO, all the statements about both will be displayed for either (e.g., Israel and Israel). Improvements are needed, however:

  • there is no way via the current UI to add or remove owl:sameAs assertions – they have to be added as tiny RDF files via the add/remove RDF command
  • sameAs in the subject position
    • It will adversely affect performance to require VIVO to detect that two or more URIs for an individual have been declared to be sameAs each other and then retrieve and blend all the data for each URI for rendering on a single page
      • This becomes more complicated when the 2nd or higher URI is not in the local VIVO
    • It may be a first step to simply show a link to the equivalent URI, with some form of "see also" label
  • sameAs in the object position
    • when another VIVO individual is linked to one or the other of these Israels, the application is not yet smart enough to show only one object property statement to one instance of Israel, and it looks to users like a duplicate of both the country and the relationship
  • Colorado has a use case to assert sameAs relationships between people's profiles in their university-wide VIVO and the separate implementation at the Laboratory for Atmospheric and Space Physics, where additional information about research projects, equipment, and facilities will be stored behind a firewall. They would like to pull data from the CU VIVO into the LASP VIVO dynamically, and pull any publicly visible data from LASP VIVO to supplement the CU VIVO content about a person.
  • Colorado also has a need to pull data from the Harvard Profiles system used by the University of Colorado Medical School in Denver to the CU VIVO without replicating more than is necessary. This is a similar use case to the connections needed between VIVO on the Cornell Ithaca campus and the Weill Cornell VIVO.

...