Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Performance

There are a number of possible routes to performance improvement for VIVO, and we seek input from the community on what the primary pain points are. Some performance issues are related to installation and configuration of VIVO and we are working on improving documentation, notably on MySQL configuration, tuning, and troubleshooting, but page caching has emerged as the primary performance-related improvement for 1.6.

...

  • As mentioned above, improved server, Apache, Tomcat, and database configuration and tuning
  • If we can identify key areas where some form of intermediate results are being repeatedly requested from the database, implementing Memcached could be another strategy. However, it may be more effective to provide MySQL more memory since it can use its own strategies for query caching
  • Tim Worrall has been looking at our page templates for instances where we could avoid issuing SPARQL queries for the same data repeatedly in the course of generating a single page, and has also been optimizing SPARQL queries that come to his attention
  • There is also some indication that bugs in Jena's SDB implementation that make queries other than to a single graph or the union of all graphs much less efficient, at least for MySQL.  This is hard to verify, and we have mostly been approaching this by exploring the use of other triple stores via the RDF API added with the VIVO 1.5x releases.

Installation and Testing

  • Brian Caruso has proposed adding a unit test for Solr that would build an index from a standard set of VIVO RDF, start Solr, and run standard searches. This would help prevent breaking existing functionality when addressing issues that have come up such as support for diacritics, stop words, and capital letters in the middle of names
    • A unit test has been developed for another related project at Cornell and we hope to be able to port this to VIVO, but perhaps not for 1.6
  • Developing repeatable tests of loading one or more large datasets into VIVO. The challenge here is that performance is highly installation dependent.  The most urgent problem at Cornell has been the intermittent loss of communication between the VIVO web server and the VIVO database server, which results in some threads of activity simply hanging and never returning.  As with many errors that are hard to reproduce, we have developed workarounds that divide large jobs into chunks of data that experience has shown can be removed or added without causing hiccups.  Stay tuned.

Site and Page Management

  • Make the About page and Home page HTML content editable through admin interface – this relates to display model changes
  • (Largely complete) Offering improved options for content on the home page, including a set of SPARQL queries to highlight research areas, international focus, or most recent publications
  • (Complete) Offering additional individual page template options
  • (Complete) Offering the ability to embed SPARQL query results in individual pages on a per-class basis – for example, to show all research areas represented in an academic department
  • (Complete) Cornell is working on new individual page templates that include screen-captured versions of related websites for people and organizations, so that in addition to the link to the website we show either a small or large version of a thumbnail of the page. This is done through a commercial image capture service that other sites may not want to use, and will have to become configurable. Another service might not provide the same API or resultant image size, however. In any case, the new individual page templates will have to be optional, since sites may have done a lot of customization work.
    • Could put the service-specific aspects in a sub-template that gets imported and could be default not attempt to capture and cache images at all
    • are free services out there, but they may not be there in 6 months

Support for sameAs statements

When 2 URIs are declared to be the same in VIVO, all the statements about both will be displayed for either (e.g., Israel and Israel). Improvements are needed, however:

...

  • Implementing a web service interface (with authentication) to the VIVO RDF API, to allow the Harvester and other tools to add/remove data from VIVO and trigger appropriate search indexing and recomputing of inferences.
  • This would also enable round-trip editing of VIVO content from Drupal or another tool external to VIVO via the SPARQL update capability of the RDF api
  • Put and delete of data via LOD requests – this has been suggested but we're not sure a specification even exists for an LOD "put" request – please add references here if you're aware of discussion or documentation.

Editing

  • See comments on VIVOIMPL-15 with respect to improving the permissions scheme for editing and make its functions more transparent to users. The best way forward here would be to transfer what's referred to as the editing policies, now hard-wired in code, to a set of RDF statements conforming to an editing policy ontology and editable from the Site Admin menu. This was the approach taken for the user model, and is proposed as an improved way of managing the display model (primarily for managing menu pages) and the application configuration ontology.
  • Improve editing of data held in context nodes from the organization, event, or other related entity, principally via relationships like authorships or positions or via roles realized in processes or events – most custom forms support entry and editing only from the person. This requires no new functionality but will involve implementing additional custom forms

...

  • Integrating Mummi Thorisson's Ruby-based CrossRef lookup tool for searching and loading publications into VIVO, on GitHub along with OAuth work for retrieving information from a VIVO profile in another application
  • Improving and documenting the Harvester scoring and matching functions

Internationalization

Also referred to and documented as Multiple Language Support in VIVO

  • Moving text strings from controllers and templates to Java resource bundles so that other languages can be substituted for English
  • Internationalization for ontology labels – important because much of the text on a VIVO page comes directly from the ontology
  • Improving the VIVO editing interface(s) to support specification of language tags. VIVO 1.5 will respect a user's browser language preference setting and filter labels and data property text strings to only display values matching that language setting whenever versions in multiple languages are available – but there has not yet been a way to specify language tags on text strings.

Provenance

Adding better support for named graphs in the UI (the application already handles named graphs internally and through the Ingest Tools menu).

...

  • Allowing the addition of statements about any named graph such as its source and date of last update
  • Making this information visible in the UI (e.g., on mousing over any statement) to inform users of the source and date of any statement, at least for data imported from systems of record

Visualization

  • improved caching of visualization data – a student project at Indiana University has investigated and traced the issue as a problem in allowing multiple concurrent threads trying to create the cache of data for the same type of visualization. 
    • This has been reported instead as a problem with scalability (e.g., for a Map of Science from the 32,000 plus publications in the University of Florida VIVO, or at UPenn)
    • This may solve the problem and will at least make it easier to determine whether further work on caching is necessary – if so, a solution for caching intermediate data vs. the final resulting page or image is likely to make sense
  • HTML5 (phasing out Flash) – not likely to be addressed in 1.6

Data Query and Reporting

  • Limiting SPARQL queries by named graph, either via inclusion or exclusion. 
    • This is allegedly supported by the Virtuoso triple store. This would help assure that private or semi-private data in a VIVO could be exposed in via a SPARQL endpoint
    • If this functionality is dependent on the underlying triple store chosen for VIVO, it's not something that can easily be managed in VIVO
  • There are other possible routes for extracting data from VIVO including linked data requests – if private data is included in a VIVO, all query and export paths would also have to be locked down. Linked data requests respect the visibility level settings set on properties to govern public display, but separate more restrictive controls may be required for linked data.
  • Enhancing the internal VIVO SPARQL interface to support add and delete functions, not just select and construct queries – see "Web Service for the RDF API" above

...

  • Improving the execution speed and formatting of the existing Digital Vita CV tool as implemented in the UF VIVO, perhaps changing it to email the generated CV as a rich text or PDF document asynchronously
    • The latter is the more promising approach – if people don't have to wait but can have the document emailed to them, then the perception of slow performance is largely moot
    • That said, it may be possible to improve the queries.  Florida and Stony Brook are known to be using this functionality so should be involved in prioritizing any changes
  • Developing a UI for selecting publications or grants and adding required narrative elements, based on the specification developed in the Digital Vita VIVO mini-grant
    • This has been on "the list" for two years, but makes most sense as a different application
    • Many people have voiced the opinion that getting the data out in an form that is editable in Word or other common document editing tools is much more important than managing the selection or ordering of content from within VIVO

Search

...

indexing improvements

  • Provide a way to re-index by graph or for a list of URIs, to allow partial re-indexing following data ingest as opposed to requiring a complete re-index
  • Improving the efficiency and hence speed of search indexing in general
  • improved default boosting parameters for people, organizations, and other common priority items
  • an improved configuration tool for specifying parameters to VIVO's search indexing and query parsing
  • a concerted effort to explore what search improvements Apache Solr can support and recommendations on which to consider implementing in what order
    • The same applies for re-inferencing, which is typically more time consuming
  • Implementation of additional facets on a per-classgroup basis – appropriate facets beyond ref:type, varying implementation of additional facets on a per-classgroup basis – appropriate facets beyond ref:type, varying based on the nature of the properties typically present in search results of a given type such as people, organizations, publications, research resources, or events.
    • Huda Khan has been implementing the ability to configure additional search facets for the Datastar project; some improvements may make it into 1.6
  • An improved configuration tool for specifying parameters to VIVO's search indexing and query parsing
    • Question – are any of these run-time parameters or are they all parameters that must be baked in at build time, requiring re-generation of the index?
    • Relates to another suggestion for a concerted effort to explore what search improvements Apache Solr can support and recommendations on which to consider implementing in what order
  • Improved default boosting parameters for people, organizations, and other common priority items
    • Here the question immediately becomes "improved according to what criteria"
    • This is a prime area for a special interest group of librarians or other content experts willing to document current settings and recommend improvements, including documenting use cases and developing sample data that could be part of the Solr unit tests listed above under "Installation and Testing"
  • Improving the efficiency and hence speed of search indexing in general
  • note the search unit test proposed above under Installation and Testing.

...