Deprecated. This material represents early efforts and may be of interest to historians. It doe not describe current VIVO efforts.
Deprecated. This material represents early efforts and may be of interest to historians. It doe not describe current VIVO efforts.
Task Name | Time Est. hours | % Done | Assignee | Link to section |
---|---|---|---|---|
Setup Development Environment | 8 | 0 | 1 | |
Get URIs for Institution | 32 | 0 | 2 | |
Get data for an individual URI | 32 | 0 | 3 | |
Mockup of Search UI | 24 | 0 | 4 | |
Create Solr Doc from data for URI | 40 | 0 | 5 | |
Working search UI prototype | 40 | 0 | 6 | |
baisc multi-node Hadoop cluster on IaaS | 40 | 0 | 7 | |
automated and scripted cluster on IaaS | 40 | 0 | 8 | |
Data validation code for Institution's data | 80 | 0 | 9 | |
Update system | 40 | 0 | #10 |
Git repository. How will this relate to the old project at https://github.com/vivo-project/Linked-Data-Indexer ? Should it be a fork or branch or what?
Document single node Hadoop setup.
Development Solr service setup. Must use Sorl 4.x or greater. There have been huge improvements in Solr/Lucene going from 3.x to 4.x. I've encountered systems where setting up solr can be a bit of a chore because the instructions don't make it clear what version of solr to use and what additional libraries to add. I suggest one ofthe following 1) making the instructions very clear about which version of solr to use OR 2) automating the build by downloading a URL, and copying files to the correct location for the solr home directory.
Ant/Ivy build script.
Wiki/git README documentation.
There is code to parse Catalyst pages to URIs (CatalystPageToURIs.java) and to parse the JSON from VIVO ( ParseDataSErviceJson.java). There is code to do the discovery of URIs for Catalyst and VIVO in LinkedDataIndexer/src/main/scal/edu/cornell/indexbuildere/discovery in VivoUriDiscoveryWorker.scala and CatalystDiscveryWorker.scala. These files could be used as examples but they depend heavily on the akka framework which we'd like to move away from.
See UrisForDataExpansion.java for an example of how this was done in the prototype.
This depends on Mockup of the search UI in order to develop the schema for the Solr index.
SolrDocWorker.scala uses the DocumentModifier from the Vitro code to generate a Solr document from a model for a URI. We may want to reuse this approach. Much of this code is found in LinkedDataIndexer/src/main/java/edu/cornell/mannlib/vitro/webapp/search/solr. There can be found a new translate that works well without the webapp context at MultiSiteIndexToDoc.java and new DocumentModifiers that are needed for multi site indexing.
Make tech decisions about serving search UI and about how the UI client will communicate with the Solr service.
Develop a system to allow updates. This is likely to involve some additional services as part of the VIVO webapp. The Mulit-site search index builder will need to query the VIVO webapp for a list of URIs that have been updated for a given time frame.