Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Jira Issues
anonymoustrue
urlhttps://jira.duraspace.org/sr/jira.issueviews:searchrequest-xml/temp/SearchRequest.xml?jqlQuery=project+%3D+VIVOMS&tempMax=1000

Task NameTime Est. hours% DoneAssigneeLink to section
Setup Development Environment80 1
Get URIs for Institution320 2
Get data for an individual URI320 3
Mockup of Search UI240 4
Create Solr Doc from data for URI400 5
Working search UI prototype400 6

baisc multi-node Hadoop cluster on IaaS

400 7

automated and scripted cluster on IaaS

400 8
Data validation code for Institution's data800 9
Update system400 #10

1 Setup development environment
Anchor
1
1

Git repository. Just copy the useful parts over from the DuraSpaceMultiSiteSearch branch of  https://github.com/vivo-project/Linked-Data-Indexer .

Document single node Hadoop setup.

Development Solr service setup. Must use Sorl 4.x or greater (4.2 is the current release of Solr as of 2013-03). There have been huge improvements in Solr/Lucene going from 3.x to 4.x.  I've encountered systems where setting up solr can be a bit of a chore because the instructions don't make it clear what version of solr to use and what additional libraries to add.  I suggest one ofthe following 1) making the instructions very clear about which version of solr to use OR 2) automating the build by downloading a URL, and copying files to the correct location for the solr home directory.

Ant/Ivy build script. (DONE in DuraSpaceMultiSiteSearch)

Wiki/git README documentation.

2 Develop code to build list of URIs to index for Institution from standard 1.5.1 VIVO instance
Anchor
2
2

There is code to parse Catalyst pages to URIs (CatalystPageToURIs.java) and to parse the JSON from VIVO ( ParseDataSErviceJson.java).  There is code to do the discovery of URIs for Catalyst and VIVO in LinkedDataIndexer/src/main/scal/edu/cornell/indexbuildere/discovery in VivoUriDiscoveryWorker.scala and CatalystDiscveryWorker.scala. These files could be used as examples but they depend heavily on the akka framework which we'd like to move away from.

3 Develop code to gather data required for an individual URI
Anchor
3
3

See UrisForDataExpansion.java for an example of how this was done in the prototype.

4 Mockup of search UI
Anchor
4
4

Base the UI for now on the current UI at vivosearch.org. Issues that will require consideration:

  • whether the home-grown implementation of the responsive design (adjusting the UI in stages as screen size decreases from a full-size monitor down to low-res monitor or projector, tablet, or smartphone screen size
  • how to accommodate larger numbers of institutions when a single expanded list is too long
  • whether to implement additional facets beyond the current 2 (institution and type)

5 Develop code to build and index Solr document from data for URI
Anchor
5
5

This depends on Mockup of the search UI in order to develop the schema for the Solr index.

SolrDocWorker.scala uses the DocumentModifier from the Vitro code to generate a Solr document from a model for a URI. We may want to reuse this approach.  Much of this code is found in LinkedDataIndexer/src/main/java/edu/cornell/mannlib/vitro/webapp/search/solr.  There can be found a new translate that works well without the webapp context at MultiSiteIndexToDoc.java and new DocumentModifiers that are needed for multi site indexing.

6 Working Prototype of Search UI
Anchor
6
6

Make tech decisions about serving search UI and about how the UI client will communicate with the Solr service.

7 Explore multi-node Hadoop cluster deployed to IaaS
Anchor
7
7
  

8 Scripted deploy of multi-node hadoop cluster on IaaS
Anchor
8
8

9 Data Validation code for institution's data
Anchor
9
9

10 Update system
Anchor
10
10

Develop a system to allow updates.  This is likely to involve some additional services as part of the VIVO webapp. The Mulit-site search index builder will need to query the VIVO webapp for a list of URIs that have been updated for a given time frame.