Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DataStaR : Data Staging Repository

Wiki Markup"_The purpose of DataStaR is to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers, and/or to Cornell's own digital repository._" (see \[DataStaR: An Institutional Approach to Research Data Curation\|[http://www.iassistdata.org/publications/iq/iq31/iqvol313steinhart.pdf]\])

Requirements

Accessioner's Workbench Requirements

...

Source file :

ChangeLogWriterActor.py

Input

...

port :
  • changes : ObjectToken containing a Python tuple or Java array with 2 items :
    1. the current row number as an integer.
    2. a list of changes made.
  • filename : StringToken containing the fully qualified path for the change log file.

...

Other inputs :

...

The "Change Log Writer" script also needs the fully qualified path for the change log file. This can be acquired in one of two ways:
    A string parameter on the PythonActor named 'action'.
    OR
    A port on the PythonActor named 'action' containing a StringToken.

Output ports :
  • None

The "Change Log Writer" PythonActor is used in the following workflows:

...

CSVDatastreamDisseminationActor.py

...

Input ports :

...

  • None
Output port :
  • dissemination : StringToken containing a single row from the CSV datastream.

...

Source file :

ErrorLogWriterActor.py

...

Input ports :

...

  • error: ObjectToken containing a Python tuple or Java array with 2 items :
    1. the current row number as an integer.
    2. a list of errors encountered.
    filename : StringToken containing

...

Other inputs :

...

The "Error Log Writer" script also needs the fully qualified path for the error log file. This can be acquired in one of two ways:
    A string parameter on the PythonActor named 'action'.
    OR
    A port on the PythonActor named 'action' containing a StringToken.

Output ports :
  • None

The "Error Log Writer" PythonActor is used in the following workflows:

...

Source file :

NormalizeDateActor.py

...

Input port :

...

  • input : ObjectToken containing a Python tuple or Java array with 2 items :
    1. the current row number as an integer.
    2. an ordered list of columns in the row.

...

Output port :

...

  • output : ObjectToken containing a Python tuple or Java array with 4 items :
    1. the current row number as an integer.
    2. a tuple/array of values for each column in the row.
    3. a tuple/array of changes made.
    4. a tuple/array of errors encountered.

...

Other inputs :

...

The "Normalize Date" script also needs to get a list of indexes for the columns that contain dates. This can be done acquired in one of two ways:

...

    A string parameter on the PythonActor named 'indexes'
    OR

...

    A port named 'indexes' containing a StringToken.
In either case, the string must contain either a comma-separated list of column numbers or a formula describing a regular sequence that can be used to generate the list. The format of the formula is START + INCREMENT * COUNT. For example, the formula 7+4*10 means there are 10 columns in the list, dates occur every 4 columns starting with column 7. This would generate the list 7,11,15,19,23,27,31,35,39,43.

...

Output port :

...

  • output : ObjectToken containing a Python tuple or Java array with 4 items :
    1. the current row number as an integer.
    2. a tuple/array of values for each column in the row.
    3. a tuple/array of changes made.
    4. a tuple/array of errors encountered.

The "Normalize Date" PythonActor is used in the following workflows:

...

Source file :

OutputPrepActor.py

...

Input port :

...

  • input : ObjectToken containing a Python tuple or Java array with 4 items :
    1. the current row number as an integer.
    2. a tuple/array of values for each column in the row.
    3. a tuple/array of changes made.
    4. a tuple/array of errors encountered.

...

Other inputs :

...

The "Normalize Date" script also needs the character to be used as a separator between columns in the output text string. This can be acquired in one of two ways:
    A string parameter on the PythonActor named 'separator'
    OR
    A port named 'separator' containing a StringToken.
In either case, the string must contain a comma-separated list of column numbers.

Output ports :
  • output : StringToken containg a string representing the ouput row in a CSV file. It is constructed by concatenating the values in the columns array using a separator character.
  • changes : ObjectToken containing a tuple with 2 items :
    1. the current row number as an integer.
    2. the tuple/array of changes made received on the input port.
  • errors : ObjectToken containing a tuple with 2 items :
    1. the current row number as an integer.
    2. the tuple/array of errors encountered received on the input port.

...

    1. .

The "Output Prep" PythonActor is used in the following workflows:

...

Source file :

RowToColumnsActor.py

...

Input port :

...

  • row: StringToken containing a string representation of a single row in a spreadsheet or other data matrix.

...