Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

QuestionResponse
What is the ideal deployment environment?

Stable Linux VM running Apache/Drupal or Apache/Ruby (this could be rewritten to use Hydra/Blacklight)

Stable Linux VM hosting the Solr index (could probably initially be on 1 VM)

More dynamically allocated Linux VMs for indexing – Brian C. has worked on the dynamic spin-up of instances within the Hadoop framework but seems like an improvement after starting out with a small number of stable VMs that are imaged but started and stopped manually. In time an on-demand mode of virtual machine usage, as Hadoop can manage, could help control costs. Andrew confirms the cost-saving aspects as they need to be balanced against high availability and redundancy. Important to keep in mind the requirements on the applications in the VM – can you initialize everything that needs to be initialized without handholding.

After beta phase, move toward a staging server and/or load balancing – the web traffic will not likely be so large but what we could manage it in a day-to-day sort of way. It's the indexing that takes a significant amount of CPU time.

What is the set of features for the production application?
  • Index all RDF content from remote sites ideally on a 24 hr basis, though the beta might be less often
    • include email addresses, local institutional identifiers, ORCiDs, ResearcherId, ScopusId, and other identifiers that might be useful
    • could be a part of the business model to offer more frequent updates, but Jim points out that sites may not want to be hammered
  • Faceted search interface with some ability to deal with scaling in the number of participating sites
  • Ability to adjust relevance ranking

Features more related to the über project:

  • After beta phase, ability to analyze the Solr index for duplicate names and provide disambiguated results, correlated against ORCID and services such as http://viaf.org
  • After beta phase, ability to provide web services back to distributed sites allowing them to choose people/places/journals/organizations from a central index to improve data quality and lower the cost of ongoing disambiguation
What hardware resources are needed to run the application? What level of effort in what roles? (note that this will be different from the development period)

3 parts to the code: front end, the index, and the index builder

Need to spec out each of the components and the effort to bring each up to what is required, including swapping out each part.

  • Very simple Drupal site that could probably alternatively use Wordpress or Ruby; Javascript libraries to enhance interaction and support responsive design for mobile devices;
  • Tomcat and Solr and the ability to fine-tune a Solr index, query configuration(s), and relevance ranking
  • Data manager to work with participating site, educate new sites on how to prepare data, do quality control on data on first index, respond to inquiries about relevance ranking, and organize disambiguation efforts.
  • After beta phase, a programmer to work with on disambiguation initiatives and the development of services to offer disambiguated data back to participating sites (and potentially others) on a fee basis
What resources are needed for on-going maintenance?
  • Periodic review and updating of UI and device/browser independence
  • Ongoing improvement to the indexing code to run more efficiently, permit more frequent updates, detect duplicates dynamically, improve faceting of results, support network analysis and derivative services out of the corpus of data gathered
  • Ongoing hand-holding for existing and new-users, including managing transitions in the VIVO ontology over time
What is required for application initialization?
  • The current http://vivosearch.org site is a simple Drupal site on a cloud VM with a minimal internal database, access to a Solr index housed on a server at Cornell; it has been very stable, needing attention perhaps 3 or 4 times in two years.
Who will provide post-implementation tech support to users?  How much will be needed?
  • This should be somebody familiar with RDF and the VIVO ontology and with Solr configuration as well as the idiosyncracies of university data sources; an expert programmer would not be essential except at times of introducing new features, increasing efficiency, or significantly scaling the number of institutions participating
  • Over time one goal would be to size VIVO search at a scale that can support itself plus provide some ongoing funding for DuraSpace and VIVO efforts as a whole; a dedicated technically-qualified support person would help assure the stability

 

Don't conflate services with sponsorship.

Have to spec out some high level tasks – and make the choices

  • swapping out the index builder
  • do we stay with Drupal or switch to Ruby?
  • how much change is necessary for the UI

Organizational aspects