September 25, 2015, 1 PM EST

Attendees

Steering Group Members

Paul AlbertJon Corson-Rikert (star),  Melissa Haendel,   Dean B. KrafftRobert H. McDonald,  Andi OgierBart RagonAlex Viggio

(star)= note taker

Ex officio

Jonathan MarkowMike Conlondebra hanken kurtzGraham Triggs

Regrets

Kristi Holmes,  Eric Meeks , Julia Trimmer, Dean B. Krafft

Dial-In Number:  641-715-3650, Participant code: 117433#

Agenda

 
Item
Time
Facilitator
Notes
1Updates5 minAll 
2Review agenda2 minAllRevise, reorder if needed
3Welcome Graham15 minMike, AllGraham Triggs joined the project as Technical lead on Monday.
5Semantic Versioning15 minMike, AllWould like to use semantic versioning for VIVO. See https://goo.gl/i04Z02. See http://semver.org/.
4Some thoughts on a next release15 minMike, AllDiscussion of next release
7Future topics5 minAll

attribution/contribution efforts (10/16); how does VIVO get bigger?; training program; rotation of Steering Group members

 

Notes

  1. Updates
    1. Upcoming meetings – CNI, ISWC, NDC
      1. 4th National Data Service Consortium Workshop, October 19-21, San Diego Supercomputer Center,  http://www.nationaldataservice.org/get_involved/events/NDS4/
      1. will anybody be attending the National Data Service meeting October 19-21?
      2. Please let Mike know if you are attending any meetings where VIVO will be presented and/or by topic are relevant to VIVO
    2. Justin Littman's new service
      1. very much along the lines discussed for different APIs in the roadmap discussions (Justin is a member of that task force)
      2. See https://github.com/gwu-libraries/vivo2notld
    3. Chris Barnes, new data
    4. New Relationship diagram – teaching
    5. Community pages facelift and blog post
  2. Review agenda
  3. Welcome Graham
    1.  Joined DuraSpace on Monday, and he and Mike have had extensive conversations
    2. Has already been doing some interesting work
    3. Graham: joined from Symplectic, having done repository integration there as well as rewriting and updating the connector to VIVO-ISF via the Harvester; 20 years working in online publishing
    4. Introductions all around
  4. Semantic Versioning for VIVO
    1. VIVO does a 3-part version number – e.g., 1.8.0
    2. Ran across in exploring GitHub – the founder wrote down his thoughts on major versions, minor versions, and patch versions – and went to the trouble of defining what this numbering system might mean.  See http://semver.org 
    3. Having a guideline for deciding something is a major version, minor version, or dot (patch) version would be helpful for VIVO
    4. Provides explicit information about backward compatibility and the existence of new features, but there's an element of marketing that impacts releases, too
      1. Once worked on a project that never got past their 2.x.x level, and after a while was perceived as being stagnant since never got to 3
      2. Make allowances for when you need to give a burst of energy and step forward
    5. From a marketing point of view, VIVO may have an opposite problem – it's often the case that there were ontology changes in almost every version that did provide
      1. The delivered software for upgrading the database and converting to new triples has always been strong
      2. But there are changes to other things that relate to VIVO, and warranted major version numbers to warn people that the relationship of the app to other systems has changed
        1. Consequently our major version numbers would go up rapidly without demonstrating new features
        2. We may decide to focus on alerting people to schema changes
      3. And no community can sustain a lot of disruptive changes that take a lot of work very frequently
        1. When the Fedora community did their major architectural rewrite, the goal was to position it as something for new users rather than as an update – and they spent the next year doing migration support and building scripts to help people with old scripts get up to the new version, including prototypes with pilots;
          1. they socialized that process into the community
    6. We have versioning of both the ontology and the application
    7. The ontology sits outside the application, and we should track the provenance and versioning of any component
      1. The ontology group is getting some core developers involved, but to move forward the ontology and tools for working with the ontology
      2. But it's not working on consumption of the ontology by the application – should be a separate task force, perhaps led by Graham
    8. If these are separate tracks, how do you ensure that the ontology and software don't develop incompatibilities?
      1. A standard problem – ideally you have somebody who's knowledgeable about the ontology's consumption by the application working with the ontology team
      2. There's not a very good intermediate layer that buffers the two – other applications have this, but not VIVO
      3. The VIVO-ISF ontology includes a lot of material not relevant for a lot of VIVO users, but if we continue to be compatible with the larger ontology, we can combine data
    9. VIVO is using many different ontologies, and we have a very close working relationship with some of these but not others (e.g., not with VCard)
      1. This buffering process and vision of versioning of the ontologies and the application is important for our planning process
      2. We are not dependent on the ontology releases necessarily – we can elect not to take certain changes in the ontology into the application
      3. We can control things that are in our own namespace, but not others; we are constantly
      4. And we need migration planning and support as part of our release development and support processes
    10. There are consequences in SPARQL queries, training, etc. – need to understand and manage them in an appropriate way
      1. We can't issue ontology changes every two months – we need to calibrate the kind of changes we want with our resources and community
      2. While there are marketing considerations, would like to align those with semantic versioning as far as possible
      3. We are likely to have all three types of versions, including versions that change the ontology
      4. Not sure the marketing problems will be encountered in the next several years
    11. Mike will put the semantic versioning proposal out to the community for feedback and use cases that we need to consider
  5. Some thoughts on a next release 
    1. There is a roadmap process that was described at the steering meeting at the conference and described in a poster shared with the community.
    2. There has been some preliminary work around performance problems that might produce a 1.8.1 patch release, based on issues that have in part already been addressed
    3. Mike and Graham went through the open JIRA issues and found another few changes that are more cosmetic or minor improvements, similarly appropriate for a patch release – that would help notify the community that progress is being made
    4. Maven – an idea that Graham has had is that we look at how we deliver the software with an eye toward how we can help automate the creation of a development environment
      1. There are bunch of dependencies that may be able to be done in a more scriptable hat is familiar to Java developers
      2. Maven is the way developers most commonly support a build process in an integrated development environment
      3. Would like to make Maven the way to do the Vagrant build process as well, so it's more closely tied to the primary build process – the work to set up the Vagrant instance has to be re-done with each release
        1. Want something that works out of the same repo with the same components and configuration
    5. Other thoughts?
      1. What about benchmarking performance with a standard set of data – page load times; should be real
        1. Ted Lawless has a dataset – we use that to provide some benchmarks; if somebody else downloads and builds it, they can be compared
        2. Chris Barnes has just published data from UF
          1. A single RDF file with 23 million triples that would certainly exercise the application; one feature of the Florida data is that it has things in it that are not expected – both a blessing and a curse – some of the things in the data are very unfair to the application (e.g., that a person is also a journal)
          2. This data might help stimulate analysis and data consistency work as well as performance issues
        3. The most interesting metric is what the delta is between the legacy version and the new version
        4. Run a benchmark on 1.8 and then on 1.8.1, and distribute the results with the release
        5. One goal would be to point out configuration issues in local installations, if there's a wide gulf between what the developers are achieving and what a site installation is achieving
    6. Mike would like to get the roadmap task force together with Graham to talk about the next release – the benchmarking issue is certainly worth discussing
  6. Future topics

 

Action Items