Date

Call-in Information

Time: 11:00 am, Eastern Time (New York, GMT-04:00)

To join the online meeting:

Slack

Attendees

(star)  Indicating note-taker

  1. Benjamin Gross
  2. Huda Khan 
  3. Don Elsborg 
  4. Rachid Belkouch
  5. William Welling 
  6. Andrew Woods
  7. Alexander (Sacha) Jerabek (star)
  8. Brian Lowe
  9. Ralph O'Flinn
  10. Michel Héon

Agenda

  1. Next i18n sprint date - Doodle (poll closes today)
  2. Moving priorities forward: Data ingest
    1. Task force!
      1. Deliverables?
      2. What are the use cases?
    2. Tools:
      1. RML-Mapper
      2. Ingest Tools - Who Is Using What
    3. Entities (scholars-discovery model) (fly-in discussion):
      1. Person, Grant, Publication, Authorship, ...
  3. VIVO-Scholar and Core: 2020-07-29 - Special Topic - VIVO Scholar Next Steps
    1. Review of Scholar's indexing approach
    2. Review of VIVO's indexing approach
  4. Renaming of 'master' branch? (ZDNet, BBC)
    1. Guidance from GitHub 
    2. DSpace has done it
    3. Fedora has done it

Future topics

  1. Vitro JMS messaging approaches - redux
    1. Which architectural pattern should we take?
    2. What should the body of the messages be?
  2. Incremental development initiatives
    1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    2. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
    3. Integration test opportunities with the switch to TDB - requires startup/shutdown of external Solr ..via Maven
      1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Tickets

  1. Status of In-Review tickets

    type key summary assignee reporter priority status resolution created updated due

    Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Notes 

Draft notes in Google-Doc 

1.Next i18n sprint date - Doodle (poll closes today)

Aw: Most have filled in the poll, sprint will move tickets forward, stabilizing the regression test environment, aim to get i18n into master. Dates still to be finalized. 

Bg: most important is to ensure that existing work is ready to be merged, all on board with functionality.

Aw: ideally we don’t want to be committing into master known bugs, minimize those that remain. Any other priorities that are important to be considered here?

Mh: most important thing is to merge the basic things for i18n, clean up things languishing in dev and other branches, outstanding pull requests.

Bg: does the i18n work address the issues raised by linguistic labels as described by Joachim? Label management in multilingual environment is buggy.

Mh: this is addressed in i18n work.

Aw: would be good to get Joachim’s feedback on i18n work here. See subject line: [vivo-tech] Problem when saving labels

Moving priorities forward: Data ingest

Aw: important initiative

RoF: repurposed old channel in Slack for data ingest. Solicit input on status quo, current problems, where they would like to see developments. Collating information on this from various sources, trying to determine commonalities. Still gathering information, will consolidate into document and set some goals for task force.

Aw: does seem valuable to gather info from list and wider community.

De: appreciate the slack channel, most people here have responded. Are most people using the data pump? 

RoF: not sure, pump is good for initial stages, but then wind up using local solutions. The pump is very hands on, surprising to hear that many are using it in prod.

Aw: there might be an architectural question that needs to be raised.

Bl: variations on a theme that keeps arising that do have architectural implications. Goal is to triplify as quickly as possible, dump them into a second behind the scenes VIVO that acts as a temporary hopper....transform raw data into format that is usable by VIVO, including disambiguation some projects are rule based, some have custom queries, lots of processes involved in transforming, migrating, publishing. In some cases this can be straightforward, other situations are more complex. Big graph of triples up front and then work with it to add to VIVO. Incremental updates are trickier rather than redoing everything. So other approaches that are incremental are worth exploring.

Bg: i like the notion of triplifying everything. We use python for our transformations, different workflows from there to get data into VIVO. There is a standard way of doing this in our GitHub. We use baseline models that we then have to tweak based on source data or requests to display data differently. It is an ad hoc set of procedures that depend on data sources. We use rdf-lib to store as graph and dump into turtle files, other local developments.

RoF: would like to have references to all these techniques and developments.

Bg: a lot of this is specific to our implementation, may not be easy to adapt to someone else’s needs.

De: would like to see more about named graphs.

Aw: how do you define your named graphs?

De: entities go into their categories: profs into their named graphs etc. running into problems with sparql translator where it just hangs. When the translator tries to match new triplified data with existing VIVO data it just hangs. Do not want to have to get deep into Jena in order to do this, would like to find a different way, remove dependencies on Jena. would like to avoid reinventing wheels.

Aw: hopefully the task force can find a way to do this once together, determine a general workflow that works for most.

Bl: need common tools for freely available sources to help people get their VIVOs up and running. Other question is about the architecture of application itself, how dependent do we want to be on central data store and force people to grapple with this model up front. VIVO exposes all graphs so all triples need to be nicely lined up, this can be difficult to accomplish. If this can be simplified or improved to allow layers to be built up over time, so as to not force things to be need to be set up perfectly at the outset.

Aw: having well defined entities, not forcing people to immediately think about triples, easy flow to get this into VIVO all sounds good. Harder to make the jump to implications for VIVO architecture.

RoF: we need to think about data ingest into what VIVO actually is, and not some future VIVO that is constructed differently.

Hk: do you have some kind of template to ask questions about ingest? Like what is your beginning type of data? What are the steps to getting it into VIVO? Some structured way of  gathering info to be able to compare across responses.

RoF: eventually yes, at this stage still simply trying to quickly get a feel for what is out there. Did not want a big formalized list in case I left something out.

Aw: what can we do to help move things forward?

RoF: will know better when I organize all the responses. Looking for interested parties to be part of this task force on a full time basis.


VIVO-Scholar and Core: 2020-07-29 - Special Topic - VIVO Scholar Next Steps

Aw: hoping to talk through methods of populating VS vs VIVO and populating different search indexes: core vs scholar, with an eye to seeing how we can unify to some degree.

HK: it is always helpful to consider concrete examples, step by step walk throughs with an actual example to see how it works.

Aw: this in order to get a baseline understanding of things and how they work.

De: specific use cases would help illustrate this.

4.Renaming of 'master' branch? (ZDNet, BBC)

Guidance from GitHub

Aw: should follow GitHub’s lead on this who will be sending out recommendations for the change from ‘master’ to ‘main’.

Actions

  •  



  • No labels