Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Attendees

Jonathan Corson-Rikert
Brian Caruso
Brian Lowe
Jim Blake
Jonathan Markow
Andrew Woods 

Agenda

  1. Any updates?
  2. review questions and any answers from last time to understand the activities we need, that are both organizational and development, to get a better oerall picture
  3. Further definition of technical requirements
    1. Where is our knowledge of Hadoop?
    2. How applicable to the VIVO linked data index builder is the code for indexing RDF versions of MARC records?
  4. Further definition of business

...

  1. requirements – straw man proposals for discussion
    1. Stage 1 – by VIVO conference mid-August 2013
      1. reproduce the current index and update the current vivosearch.org site
      2. establish a workable frequency of update – monthly or weekly at most
      3. then expand 5-10 more institutions (Melbourne, Colorado, Duke, Brown, Eindhoven or VU Amsterdam, Cambridge might be candidates)
    2. Stage 2 – for October, 2013 CTSA PI meetings
      1. work with ~5 CTSAs to demonstrate an index including only designated CTSA investigators
        1. UF, Weill Cornell, Washington University, Indiana, Harvard
      2. work with CTSA researcher networking group 
    3. Stage 3 – by the end of 2013
      1. work with Colorado to help them set up an independent Colorado search across Boulder, Colorado Springs, and Denver campuses
      2. prepare more detailed ongoing business plan as part of marketing campaign for 2014
  2. Other topics

...

Discussion

How much technical work is required to updating Distributed Indexer?

Review of questions
  1. Worthwhile to review questions and responses
    • What will this project consist of, soup to nuts
    • What will the total costs look like?
  2. The idea of an uber-project with CTSAs
    • CTSAs may not currently be in a position to leverage a cross-institutional VIVO Search: disambiguation
  3. Which institions do we target in which phases?
    • Pilot group: friend-institutions and few CTSAs
    • Post-pilot: additional CTSAs
  4. Defining phases/stages
    • Pilot phase
    • CTSA phase? General open phase?
  5. Defining sequence of tasks
  6. Defining roles
    • Project roles
    • Production service roles
  7. How to define level of effort for various roles
    • Work backwards from the required tasks
  8. May want to consider additional roles
    • Business liason
    • Fund-raising manager
    • Technical lead
  9. Division of labor
    • DuraSpace/VIVO relationship is that DuraSpace provides advice/mentoring
    • It may or may not make sense for DuraSpace to participate in implementation
    • DuraSpace can help with marketing and cloud-service support
  10. Keys to success
    • Additionally, needs to be as easy as possible on client-side
    • Partner specialist will also be required
  11. Deciding on frontend technology
    • What in-house skills are available?
    • What are the application needs
  12. Dynamic/scalable servers useful in Hadoop context, less so for frontend app
  13. Indexing frequency: could be a business model around higher frequency
    • Sites would have to be able to support hammering of linked-open-data requests
  14. Improve UI to support increased institutions and facets
  15. Need ability to adjust relevancy rankings
  16. Analysis for disambiguation of source data
Next steps
  1. Technical analysis
    • Move towards technology choices
  2. Business model?