Date

Call-in Information

Time: 11:00 am, Eastern Daylight Time (New York, GMT-04:00)

To join the online meeting:

Slack

Attendees

(star)  Indicating note-taker

  1. Benjamin Gross 
  2. Andrew Woods
  3. Brian Lowe 
  4. Ralph O'Flinn 
  5. Huda Khan (star)
  6. Michel Héon
  7. Alexander (Sacha) Jerabek 
  8. Mike Conlon
  9. Don Elsborg

Agenda

  1. Announcements
    1. VIVO Triple Store Roadmap Proposal - communication with community regarding moving forward with the plan
      1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    2. Symplectic's Harvester... now open source, what next?
  2. Performance testing different VIVO / triple store configuration
    1.   Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    2. Fuseki as alternate triple store
    3. Ingest performance results, loading/inferencing OpenVIVO data (4.4M triples)
      1. VIVO TDB - total time: 9m 22s
      2. RAW TDB – total time: 43s, 45s, 43s (3 replications)
      3. RAW TDB2 – total time: 31s, 32s, 31s (3 replications).  TDB2.tdbloader warned about 3 IRI with port numbers under 80 in the data.
      4. VIVO SDB - total time: 1h 44m 33s
      5. Fuseki (backed by TDB - local machine) - total time: 1h 31m 36s
      6. Fuseki (backed by TDB - remote machine) - total time: 1h 31m 33s
    4. Read performance tests - Connect with VIVO Scholars?
  3. 2020 - VIVO Sprints
    1. VIVO-i18n - Canadian French Initiative
    2. Doodle closing on Friday, Feb 28
  4. Pruning legacy Vitro/VIVO GitHub branches
    In an effort to reduce the number of abandoned or out-dated branches in the VIVO and Vitro GitHub repositories, we will be taking a multi-phased approach at pruning branches:
    
    - All branches that predate the VIVO 1.6 release will be removed (i.e. all branches from 2013 and before)
    - Of the remaining branches, those with no commits ahead of the 'master' branch will be removed
    - Of the remaining branches, detailed review of branch content will determine whether the branch should be removed/retained
    - Release maintenance branches will be retained
    
    If you have an interest in ensuring that any of the branches in Vitro or VIVO are retained, please let that be known.
  5. 1.11.1 maintenance release - security patch
  6. Vitro pull-requests
    1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.  - looks good
    2. Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Tickets

  1. Status of In-Review tickets

    type key summary assignee reporter priority status resolution created updated due

    Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Notes 

Draft notes in Google-Doc

Announcements

  1. Symplectic's Harvester... now open source, what next?
    1. Ralph’s email response to Tom and Violeta’s queries on the develop email list
    2. “Yes back at the Symplectic NA Conference I spoke with Jonathan about opening the code and giving it a home. Over the next year they did some code cleanup and documentation and where able to announce at the Digital Science NA Conference in 2019 they were going to post it on GitHub. Now that is just an archive and they are not supporting it.  I have forked it over to my GitHub repo to continue to support it, but I have also forked from there to https://github.com/vivo-community/Vivo_Harvester_V2. I want the VIVO Community to know about this connector from Elements to VIVO so they can use it and even help in it's support.”
    3. Will be adding a new page under Apps and Tools Catalog for the Symplectic Harvester to go over use and support.

Performance testing different VIVO / triple store configuration

  1. Running a totally clean VIVO, loading the file listed as sample data in GitHub (OpenVIVO data).  Ingesting the data using VIVO connected to multiple triple store options.  
  2. Ingest performance results, loading/inferencing OpenVIVO data. (Ingesting the open vivo ttl file using the ingest option through the system admin interface using Site Admin -> Add or Remove RDF Data.  Ingest tried once for each of the following. Note that, when done through VIVO, inferencing is also occurring. Indexing seems to occur after the fact according to log files. ):
  3. From VIVO-1743 above, the following comment describes the process for ingesting the data:

Test should be performed on at least the following triple store configurations:

  1. TDB
  2. SDB
  3. Fuseki (backed by TDB)

For ingest timing, the following procedure should be followed:

  1. Using OpenVIVO data (https://github.com/vivo-project/sample-data/blob/master/openvivo/openvivo.ttl.zip)
  2. Clear triple store prior to test
  3. Log in as vivo_root
  4. Verify no content in VIVO
  5. Site Admin -> Add or Remove RDF Data
    • From local download: openvivo.ttl

Using the following patch that adds log messages, the timings can be tracked by "grepping" the vivo.all.log file for the term "ingest".

https://github.com/awoods/Vitro/commit/d54e0324eab69baab4a283f69bd79ff64d817820

From the results above, it seems like TDB is faster than both SDB and Fuseki.

Brian: Can’t tell just from these results whether Fuseki itself is slow, since multiple factors are involved.  Could be bulk-loader exists in Jena but not handled well for bulk loads in other options. Will need to dive into the code and profile what’s going on before determining from where the performance delays emanate.

Mike: Similar sized dataset using TDB Loader with no inferencing loads in about 15 seconds. Half a million triples. Comparable to OpenVIVO data

Brian: How many triples are in the OpenVIVO dataset?

Mike: 500 people and about 2000 publications. Could get the number.

Andrew: Potential next steps?

Michel: Might be useful to turn off inferencing and only look at loading times to start.  

Andrew: Would be useful to see if there is a better way to turn off inferencing. 

Brian: In WEB-INF/resources/startup_listeners.txt

https://github.com/vivo-project/VIVO/blob/master/webapp/src/main/webapp/WEB-INF/resources/startup_listeners.txt#L37

Commenting out the SimpleReasoner startup listener (on line above).

Once load is done, can comment this back in, and then go to the site admin page to select a recompute inferences option. 

Turning off indexing on load: https://github.com/vivo-project/Vitro/commit/81aed9ac58071497fb429ee309ca50f6bce924e5 

Or could comment out https://github.com/vivo-project/VIVO/blob/master/webapp/src/main/webapp/WEB-INF/resources/startup_listeners.txt#L75 

Mike: In new architectures, should isolate reasoning process and not have production system depend on lengthy reasoning process. 

Andrew: Adding finer grained timings to potentially isolate more than just start and end. Also read-time performance.

Mike: Why Fuseki?

Ralph: Alternate triple stores.

Andrew: Only other example talking to an external triple store.  SPARQL over HTTP for remote triple store.

Don: Should be able to do blazegraph?

Andrew/someone: yes

Mike: Whatever is going on in Fuseki is not a happy picture given the load time

Don: Round trips with HTTP and JVM etc.? (as Sacha said?)

Andrew: Or VIVO implementation for the SPARQL RDF service

Brian: Would look there first since there are possibly simple optimizations that could occur there first.  Not part of release testing so good to take a look there.

Mike: Raw Fuseki with raw tdb? Fast?

Brian: Maybe not really fast but probably better than what Andrew saw.  

Ralph: Need clear documentation to make sure we specify the process and components

Don: Are there triple stores that do inferencing on their own?

Brian: Such (magical creatures - sorry, notetaker license) things do exist. But what that will look like/how performant that might be may vary.  Even simple RDFS reasoning, any time assertion is made at specific subproperty, would get all the super-properties also displaying in the interface.  At display level, would need to see how to display most salient level of reasoned information.  

Don: Most specific type was supposed to address this kind of situation?

Brian: Yes.  Something similar possible in Sesame reasoner (rdf4j).  An explicit materialized inference of most specific version or complicated filter logic to only show the most specific type.  Or more specific queries that only ask the reasoning store for those properties you wish to show and not making a blanket request to get all the information (asserted + reasoned).  If relying on built-in reasoning, would need to consider these issues.

Don: Stardog does on the fly inferencing.  

Brian: Two different strategies.  (1) rule based inferences that are then materialized in the store (similar to VIVO).  (2) Stardog (possibly outgrowth of Pellet) - backward chaining mode where infer to answer queries and not materializing every possible statement that could be inferred.  That kind of approach may be simpler for doing a nice looking display if using reasonable queries for individual. Would this mean a performance hit at query or load time?  Lots of fun complicated questions to consider in this area. 

Andrew: Running tests again with indexing and inferencing turned off.  Looking at SPARQL implementation. Relatively comprehensive read test would be connecting VIVO scholars to VIVO since it iterates over most of the triples within VIVO to populate the Scholars Solr index.  William has both SDB and TDB connector. Could add SPARQL over HTTP connector.  

2020 Sprint Planning

  • VIVO-i18n - Canadian French Initiative
  • Doodle closing on Friday, Feb 28
  • Leaning towards April 6th/April 13th but Doodle poll not closed yet
  • Andrew: Have been looking at Selenium.  Would be good to create user interaction tests.  In process of Sprint, making a pattern for others to follow.  
  • Michel and Sacha: At this point, goals are clear enough.  
  • Michel: Better to have this conversation in i18n slack channel than develop?
    • Have other things to post and unsure which channel.
  • Andrew: yes i18n
  • Andrew: Code will become part of the core instead of being an overlay.  Successful sprint will show VIVO editing in three different languages: french, german, and english.
    • Testing will be done with Selenium
    • For collaboration, better to have the UQAM code in Github
  • Sacha: Makes sense
  • Andrew: Everything be extracted including English so that English is not preferenced/behaves like all the other languages.
    • All the code, JSP, Freemarker be completely templatized and not have English baked in.
    • Not established consensus on consolidating all the languages into a centralized Github repository.  Has been interest in keeping different language repositories separate.
  • Michel: Have VIVO and VIVO scholar.  Translation language won’t be the same for the two systems.
  • Ralph: Goal to use same i18n files for both VIVO core and VIVO Scholar.  No internationalization in VIVO Scholar but long-term goal is to have that incorporated and use the i18n files from VIVO instead of recreate the wheel.  
  • Sacha: If development happens without coordination between VIVO and VIVO Scholar, may be discrepancy between the elements in the systems that need to be translated.   
  • Mike: Timeline for introducing internationalization to Scholar?
  • Ralph: Haven’t seen a date for this yet.  Priority is to get to the next release. 
  • Don: Data is moving from multiple locations (queries for SPARQL -> SOLR, then -> GraphQL).  
  • Mike: Will need to have a workshop where the Scholar and core groups coordinate around internationalization
  • Michel: Suggest move translation files to ontology files.  Don’t know if goal in this sprint. For new release of VIVO Scholar and perhaps local VIVO, just move the ontology file incorporating translations (i.e. translations of property labels) from one system to the other.  
  • Mike: Architects not clear about how much text is coming from ontology and is templated at the application level (e.g. FTL and JavaScript).  
  • Don: Supporting internationalization in search index?
  • Michel: Probably. 
  • Andrew: Saw this in your demo?
  • Michel: yes
  • Don: If model/pattern for internationalization in Solr, then that pattern could be adopted by VIVO Scholar
  • Mike: Scholar is not using ontologies 
  • Don: Speaking about patterns solely at search index level which could then be transferable to search index used in Solr.
  • Mike: William’s approach may be able to do that
  • Michel: In Jena, can do SPARQL requests where Solr behind the SPARQL engine.  Could do a SPARQL request with text search so don’t have to maintain Solr within VIVO architecture, just use the one inside Jena.  
  • Michel: Test instance VIVO available to everyone. Reset with data every night.  Can use that as a benchmark to evaluate the evolution of the internationalization

Pruning legacy Vitro/VIVO GitHub branches

Message from agenda to be sent out.

Actions

  •  


  • No labels