Deprecated. This material represents early efforts and may be of interest to historians. It doe not describe current VIVO efforts.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

The responses below are a first cut – please feel free to question, comment, or replace (Jon).

Organizational

QuestionResponse
How many institutions does the application currently support?8
How many institutions will be targeted?initially 12-20; in 2 years, 100; ultimately 1000
What are the roles?project manager | ontologist/data curator | UI designer | web developer | indexing programmer | system manager
What will be the division of labor?

Initially: DuraSpace could do project management and system administration/support, and VIVO ontologists and developers could rewrite the indexing code, update the UI and web front end, and review candidate data

Utlimately: could become a division between all aspects of production support, including marketing, and development of new features based on input from subscribers and sponsors

What are the primary "keys to success"?going beyond providing the obvious first win (integrated search) to addressing some of the immediately visible disambiguation problems (the same persons, organizations, events, funding agencies, journals, etc. coming in from indexing with multiple different URIs from the different source systems
Is there a market to attract service providers for readying data at campuses/organizations?yes, including Symplectic, Recombinant, potentially others

 

Technical

QuestionResponse
What is the ideal deployment environment?

Stable Linux VM running Apache/Drupal or Apache/Ruby (this could be rewritten to use Hydra/Blacklight)

Stable Linux VM hosting the Solr index (could probably initially be on 1 VM)

More dynamically allocated Linux VMs for indexing

After beta phase, move toward staging server and load balancing

What is the set of features for the production application?
  • Index all RDF content from remote sites ideally on a 24 hr basis, though the beta might be less often
    • include email addresses, local institutional identifiers, ORCiDs, ResearcherId, ScopusId, and other identifiers that might be useful
  • Faceted search interface with some ability to deal with scaling in the number of participating sites
  • Ability to adjust relevance ranking
  • After beta phase, ability to analyze the Solr index for duplicate names and provide disambiguated results, correlated against ORCID and services such as http://viaf.org
  • After beta phase, ability to provide web services back to distributed sites allowing them to choose people/places/journals/organizations from a central index to improve data quality and lower the cost of ongoing disambiguation
What resources are needed to run the application?
  • Very simple Drupal site that could probably alternatively use Wordpress or Ruby; Javascript libraries to enhance interaction and support responsive design for mobile devices;
  • Tomcat and Solr and the ability to fine-tune a Solr index, query configuration(s), and relevance ranking
  • Data manager to work with participating site, educate new sites on how to prepare data, do quality control on data on first index, respond to inquiries about relevance ranking, and organize disambiguation efforts.
  • After beta phase, a programmer to work with on disambiguation initiatives and the development of services to offer disambiguated data back to participating sites (and potentially others) on a fee basis
What resources are needed for on-going maintenance?
  • Periodic review and updating of UI and device/browser independence
  • Ongoing improvement to the indexing code to run more efficiently, permit more frequent updates, detect duplicates dynamically, improve faceting of results, support network analysis and derivative services out of the corpus of data gathered
  • Ongoing hand-holding for existing and new-users, including managing transitions in the VIVO ontology over time
What is required for application initialization?
  • The current http://vivosearch.org site is a simple Drupal site on a cloud VM with a minimal internal database, access to a Solr index housed on a server at Cornell; it has been very stable, needing attention perhaps 3 or 4 times in two years.
Who will provide post-implementation tech support to users?  How much will be needed?
  • This should be somebody familiar with RDF and the VIVO ontology and with Solr configuration as well as the idiosyncracies of university data sources; an expert programmer would not be essential except at times of introducing new features, increasing efficiency, or significantly scaling the number of institutions participating
  • Over time one goal would be to size VIVO search at a scale that can support itself plus provide some ongoing funding for DuraSpace and VIVO efforts as a whole; a dedicated technically-qualified support person would help assure the stability

 

 

  • No labels