Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Rick Johnson

Exciting time to be involve ed involved with SHARE

Erin Braswell

OSF work space.  Code at GitHub.

...

The Harvester gets the data from the provider. Ues Uses date restrictions to get "new" data. The normalizer creates the values that can go into the SHARE data models.  

Title issues:  Unicode, LateX, MS Word, foreign languages.  Attempt to store the language provided by the provider.  Joined fields for titles with multiple titles.  Can be stored as a a list n the extra class.

Normalizers can guess title or identifier or DOI.  Usually conservative normalizers. 

Idea:  data inspectors:  Write elastic searches to get percentages of populated/vacant fields, by provider, by date range.  Would show the density of field values in the normalized data. Could be used to draw control charts of field values density.  Mirror the values.

Idea:  data inspectors:  Identifiers are a problem, often come in "random". 

Idea:  data inspectors:  feed the results back the the providers.  The providers may be able to suggestions enhancers to the harvesters and normalizers.

Documents can be updated – provider's id.  If the metadata comes in for a record that exists, COS versions the record and provides the most current unless the query asks for versions.

See https://staging-share.osf.io/api/