You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Description

DataStaR : Data Staging Repository

"The purpose of DataStaR is to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers, and/or to Cornell's own digital repository." (see [DataStaR: An Institutional Approach to Research Data Curation|http://www.iassistdata.org/publications/iq/iq31/iqvol313steinhart.pdf])

Requirements

Accessioner's Workbench Requirements

  • Observation data in tabular form – CSV files for initial implementation.
  • Small to medium scale datasets.
  • Original dataset may or may not be stored in Fedora.
  • Data normalization and cleansing operations to be applied to data.
  • Normalization and cleansing operations should be reusable.
  • Processed datasets will be stored in Fedora.
  • Results must be repeatable.

Implementation Assumptions

  • Selecting data operations and execution parameters requires human intervention.
  • The processing rate is unimportant.
  • Ingest into Fedora is controlled by content models.
  • Simple workflow model – see Visio Flow Diagram

Some other requirements

  • Retangularity - this is very interactive which makes it a poor fit for Kepler
  • Column headings - again, listing problem headings is not an issue, but "allow edits" is too interactive
  • Data quality control - Kepler can certainly create the histograms or scatter plots of the data, but there wouldn't be the capability to select data values and correct them interactively.

Project Components

Kepler Actors

Actors that provide data access and accessioning functionality.

Kepler Workflows

FCRepoDataNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes the date column to satandardize the format and saves the results to a local file in CSV format.

Screenshot :

[]

Source file :

FCRepoDateNormalizer.xml

.

  • No labels