Date

Call-in Information

Time: 11:00 am, Eastern Time (New York, GMT-04:00)

To join the online meeting:

Slack

Attendees

(star)  Indicating note-taker

  1. Brian Lowe
  2. Benjamin Kampe 
  3. Georgy Litvinov 
  4. Michel Héon 
  5. William Welling 
  6. Benjamin Gross (star)
  7. Ralph O'Flinn 
  8. Don Elsborg 
  9. Huda Khan 
  10. Dang Vu Nguyen Hai
  11. Christoph Gopfert

Agenda

  1. Conference debrief
  2. JIRA/GitHub issues
  3. Reviewing 2019-01 Architectural Fly-in Summary#201901ArchitecturalFlyinSummary-Ingest
  4. Moving Scholars closer to the core : continuing discussion from last committers' meeting
    1. "win/win" opportunity: Scholars and VIVO both eliminate some complexity
    2. converting Scholars SPARQL queries to VIVO DocumentModifiers
    3. replacing URIFinders with fast, reliable Solr lookups 
  5. Prioritizing future development items:
    1. quick wins / items for a more rapid release
    2. collaborative items for future sprints
    3. (Add/edit at will) spreadsheet: https://docs.google.com/spreadsheets/d/103P9P4v6yUBSb5BnVaK40NoGx1fIYyL8uaHKUubZWbE/edit?usp=sharing
  6.  
  7. VIVO in a Box current document for feedback:

Future topics

  1. Prioritizing and planning post-1.12 development
  2. Forward-looking topics:
    1. frameworks: Spring / Spring Boot / alternatives
    2. Horizontal scalability
    3. Deployment
    4. Configuration : files / environment variables / GUI settings
    5. Editing / form handling
    6. Adding custom theming without customizing build
  3. Post-release priorities
    1. Ingest / Kafka
    2. Advanced Role Management
    3. Moving Scholars closer to core - next steps
  4. Vitro JMS messaging approaches - redux
    1. Which architectural pattern should we take?
    2. What should the body of the messages be
  5. Incremental development initiatives
    1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    2. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    3. Integration test opportunities with the switch to TDB - requires startup/shutdown of external Solr ..via Maven

Tickets

  1. Status of In-Review tickets

    type key summary assignee reporter priority status resolution created updated due

    Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Notes

Draft notes on Google Drive


  1. Item 0: Release of 1.12
    1. Release is half ready… Ralph will try to publish to the sonatype repositories today
    2. Official announcement, is there any difference from the alpha announcement?
    3. Ralph: Pretty much the same 
    4. Wiki needs to be updated so the 1.12 page says it is the current release rather than future
    5. Also need to go to Jira and close out 1.12 release and open up next release number. Reassign anything tied to 1.12 to the new release. Ralph will do this as part of his routine.
    6. William: Merge into main from dependabot broke tests 4 days ago.
    7. After Sonatype is sorted out should CI on Github work? Ralph: Yes
    8. Ralph: old version of jUnit identified by dependabot as having a security vulnerability. https://github.com/vivo-project/Vitro/pull/191
    9. Error from build:
      1. Tests in error:  testGetAllPossiblePropInstForIndividual(edu.cornell.mannlib.vitro.webapp.dao.jena.PropertyInstanceDaoJenaTest)
        getVClassesForPropertyTest(edu.cornell.mannlib.vitro.webapp.dao.jena.VClassDaoTest)
        modelIsolation(edu.cornell.mannlib.vitro.webapp.dao.jena.VClassDaoTest)
        testPreventInvalidRestrictionsOnDeletion(edu.cornell.mannlib.vitro.webapp.dao.jena.JenaBaseDaoTest)
        correctValues(edu.cornell.mannlib.vitro.webapp.dao.jena.VClassJenaTest)
        testTBoxModel(edu.cornell.mannlib.vitro.webapp.dao.jena.OntModelSegementationTest)
  1. Conference debrief
    1. Second highest attended VIVO conference of all time. Go us!
    2. Brian: Keynote about OpenAIRE was eye opening. 
      1. There’s a ton of data. Scope is worldwide. They are deduplicating, disambiguating all that stuff. 
      2. Are their disambiguation tools available? Yes!
      3. Relevance to VIVO in a box? Open aire could be a good candidate to offload some of the work.
      4. Microsoft academic is closing down but they should be in a position to replace them with minimal impact
      5. Don: They have their own ontology but they also use others. Any collaboration from the ontology group? Group: No, unfortunately
      6. Ralph: There really haven’t been any groups that show they can crawl the data Microsoft Academic was crawling 
      7. Don: Would be nice to not have to learn every data structure for every open source out there. E.g. Datacite and Unpaywall.
      8. Don: Q, does OpenAIRE have a worldwide perspective, or is it euro-centric? 
      9. Brian: Seems like plenty of coverage of US universities 
      10. Brian: Haven’t used SPARQL endpoint, but their search API results included disambiguation work. Looks like some interesting infrastructure behind the scenes that would be good to take advantage of.
      11. Don: Enjoyed the ETL track. The common denominator is the SPARQL transform. Proposal, can we standardize the SPARQL transforms for first class objects (e.g. people). Along the lines of the shapes concept. 
        1. Michel: That is what VIVO proxy tries to do. Can directly communicate with VIVO UI. He reverse engineered the HTML communication with browser and automated it. 
        2. If you put something like swagger in front, you can communicate with normal rest concepts, such as a JSON transform.
        3. William: Is this proxy dependant on the current user interface? Is that adding a coupling? 
        4. Michel: Yes, true. The end user will not be concerned… but must update the proxy.
  2. Reviewing 2019-01 Architectural Fly-in Summary#201901ArchitecturalFlyinSummary-Ingest
    1. Closest thing we have to a roadmap at this point.
    2. Still seems accurate. 
    3. We have multiple people attacking from different directions.
    4. How do we take advantage of current work and also avoid redundant effort?
    5. Ingest is one of main topics. Document has set of 10 requirements, including import must support both RDF and JSON.
    6. Note in line number 5, could use models used in Freemarker UI (ie what Michel is doing).
    7. Scroll down, next point is the UI. Freemarker will (supposedly) be deprecated by VIVO project. Idea being the front end would be generated using same models used by the ingest side.
    8. Q from Don, is TAMU using GraphQL? William: No, that was Duke’s initiative. TAMU’s approach allows lazy loading of large datasets (GraphQL requires that be done on the back end so maybe some performance issues with that idea).
    9. Brian: Should we still have GraphQL? William: We still have the API. Don: I think Duke is using it and CU is interested but we aren’t trailblazers. 
    10. William: I really like the idea of entity-centric import vs triple-centric. But it’s more complicated than that. Say you ingest authors and pubs. Then there’s another process that matches pubs to authors… is that manual, or triple-centric? 
    11. Michel: Large number of triples required for a person. If a person has a position, that’s ~88 statements in RDF. VIVO is generating 8-10 individuals to describe that. It requires 1-2 days of work to create that ETL. 
    12. Don: All the context nodes… What the hell is that?! We need a high level doc for the non-ontologists. Must re-learn the ontology every couple years when he revisits. 
    13. William: Would be nice to have a one-stop shop for all the triples necessary to create everything we need to talk about in VIVO, but it’s a massive effort.
    14. Ralph: We should have a definitive statement of ‘this is what VIVO supports’ to limit organizations expanding their own structure.
    15. Brian: Extending the ontology has always been tricky. It’s not desirable, often results in things being created that don’t make sense semantically. But one of the early appearls of Vitro was that it is flexible. There is a power in that we don’t want to lose.
    16. Don: This reference spec, we all know the target (the VIVO ontology) somewhere before there for ingest and extract, how about a JSON document that fits that still covers the robustness of the VIVO ontology (e.g. author order).
    17. William: JSON schema describes the documents holding the data. It must be extensible. 
    18. Huda: I think in addition to shapes for entities, we'd need shapes for relationships (if such a thing exists)
    19. What William seems to be referring to speaks to two areas: (a) defining the relationships  that need to be made and (b) the workflow or process for making them after the main entities have been created
    20. William: Yeah, that and a versioned working document/s of these entities and relationships rdf.
    21. Brian: The real win would be hiding all the high level ontological stuff that nobody but ontologists understand. 
    22. Brian: Can potentially start this week trying to draft some JSON schemes we can begin from.
    23. Michel: It’s not useful to have something in the ontology if it’s not visible in the UI. We don’t want to replicate the structure of the ontology in JSON. We must replicate the UI.
    24. William: We produce the UI json, makes sense. Next step, I’m not envisioning yet. We need to create API endpoints that accept this JSON. Where does the mapping to RDF exist? Is it editable?
    25. Michel: We can take a string and convert to java objects.
    26. William: Yes, but where is the mapping? Is a file? Group: It has to be.
    27. Is the proxy in a place that the community can contribute? Michel: Yes. It is easy to do a sprint to improve/expand because it was done incrementally. It could also be used as a consumer for Kafka streams. Also, it’s a decoupled service (ie we could eventually remove Freemarker). 
    28. William: Swagger UI is not the ingest tool.
    29. Brian: Should we schedule a sprint to work on this? Michel: Yep, sure. First step, open the code.
      1. How about availability?
      2. Michel: August will be out
      3. Brian: Unavailable 2nd half of July
      4. William: can’t give a time, but would like to participate
    30. Data ingest group will be interested in this.
    31. William: Should there be a separate repository for the JSON transformation documents?



  • No labels