Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

Description

DataStaR : Data Staging Repository

"_The purpose of DataStaR is to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers, and/or to Cornell's own digital repository." (see DataStaR: An Institutional Approach to Research Data Curation)

...

  • Retangularity - this is very interactive which makes it a poor fit for Kepler
  • Column headings - again, listing problem headings is not an issue, but "allow edits" is too interactive
  • Data quality control - Kepler can certainly create the histograms or scatter plots of the data, but there wouldn't be the capability to select data values and correct them interactively.

Kepler Workflows

Kepler workflows were developed to illustrate how Kepler might be used as an accessioner's workbench.

FCRepoDateNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes date columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Added Image Added

...

Source file :

...

FCRepoDateNormalizer.xml

FCRepoLatLongSplitter

Retrieves a CSV datastream from an object in a Fedora Repository, splits columns containing both latitude and longitude coordinates into two separate columns and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Added Image Added

...

Source file :

...

FCRepoLatLongSplitter.xml

FCRepoLatitudeNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes latitude columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Added Image Added

...

Source file :

...

FCRepoLatNormalizer.xml

FCRepoLongitudeNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes longitude columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Added Image Added

...

Source file :

...

FCRepoLongNormalizer.xml

Kepler Actors

A number of new actors were created that provide data access and accessioning functionality.

...

This actor writes a log file with a summary of changes made during the latest run of the workflow.

...

Source file :

...

ChangeLogWriterActor.py

Input port :

...

  • JyFedoREST
  • FCRepoKepler - Uses SimpleHTMLFormDialog to display and manage the form.

...

Source file :

...

CSVDatastreamDisseminationActor.py

...

This actor writes a log file with a summary of errors encountered during the latest run of the workflow.

...

Source file :

...

ErrorLogWriterActor.py

Input ports :

...

  • FCRepoKepler - script is based on the RowAnalyzer class in fcrepo.kepler.RowAnalyzer.

...

Source file :

...

NormalizeDateActor.py

Input port :

...

  • FCRepoKepler - script is based on the RowAnalyzer class in fcrepo.kepler.RowAnalyzer.

...

Source file :

...

NormalizeLatitudeActor.py

...

  • FCRepoKepler - script is based on the RowAnalyzer class in fcrepo.kepler.RowAnalyzer.

...

Source file :

...

NormalizeLongitudeActor.py

...

This actor sorts through the output created by a RowAnalysis script and routes the data to the proper output writer.

...

Source file :

...

OutputPrepActor.py

Input port :

...

This actor splits a text string representing a 'row' into 'columns' using a separator character such as ','.

...

Source file :

...

RowToColumnsActor.py

Input port :

...

  • FCRepoKepler - script is based on the RowAnalyzer class in fcrepo.kepler.RowAnalyzer.

...

Source file :

...

SplitLatLongActor.py

Input port :

...

The "Split Lat/Long" PythonActor is used in the following workflows:

  • FCRepoLatLongSplitter

Kepler Workflows

Kepler workflows were developed to illustrate how Kepler might be used as an accessioner's workbench.

FCRepoDateNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes date columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Removed Image Removed

...

:FCRepoDateNormalizer.xml

  • FCRepoLatLongSplitter

...

Retrieves a CSV datastream from an object in a Fedora Repository, splits columns containing both latitude and longitude coordinates into two separate columns and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Removed Image Removed

...

Source file :

...

FCRepoLatLongSplitter.xml

FCRepoLatitudeNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes latitude columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Removed Image Removed

...

Source file :

...

FCRepoLatNormalizer.xml

FCRepoLongitudeNormalizer

Retrieves a CSV datastream from an object in a Fedora Repository, processes longitude columns to standardize their format and saves the results to a local file in CSV format.

...

Screenshots :

...

Image Removed Image Removed

...

Source file :

...

...