Deprecated. This material represents early efforts and may be of interest to historians. It doe not describe current VIVO efforts.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview

A command-line application that can be gracefully interrupted and restarted at any time, with little or no loss of progress.

A full rewrite from the Beta release

  • Moved from the Scala Actors framework to a Hadoop-compatible threaded model.

Highly modular, and configurable.

  • Open to contributions from the community.
  • Configuration file that determines at startup which modules are used.
  • Rule-based configuration using the Digester component from Apache Commons

Flow diagram

  • Configuration
  • Assessment
    • Scheduling
      • Determine what discovery is to be done
    • Discovery
      • Visit the client sites to build lists of URIs of Individuals for the index
      • Will the client site give us the last modified date for the individual?
    • Synchronizing
      • Record the results of discovery in the search index
      • Remove any URIs which are no longer viable
      • Add any new URIs.
      • Record that discovery was done.
  • Retrieval
    • Prioritizing
      • Inspect the index to see which records should be updated.
      • Build a to-do list.
    • Processing - for each URI in the do-do list:
      • Modeling
        • Use LOD requests to build the model for the URI
      • Indexing
        • Build an index record from the model, and write to the search index.

 

  • No labels