Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In essence, Kepler was designed to model a workflow that will be run repetitively without the need for human intervention.
an effective environment for integrating disparate software component
Kepler is a java-based application that is maintained for the Windows, OSX, and Linux operating systems.
Create sophisticated data analysis pipelines
https://kepler-project.org
based on Ptolemy II system for modeling, simulation, and design of concurrent, real-time, embedded systems.
http://ptolemy.eecs.berkeley.edu/ptolemyII/index.htm

How does it work ?

A workflow in Kepler is composed of independent actors communicating through well-defined interfaces. An actore presents parameterized operations that act on an input to produce an output. The execution order and communication mechanisms of the actors in the workflow are defined in a director object.

a graphical user interface for composing workflows and editing the workflow environment

a run­-time engine that can execute workflows either from within the graphical interface or from a command line allowing com­plex tasks to be composed from simpler components, an executable representation of the steps required to generate results. 

Originally, Kepler was designed to download data sets into a cache on the machine where it is running, so Kepler actors run as local Java threads. However, in order to provide access to web-based resources, actors have been implemented that spawn distributed execution threads to access distributed resources.

Key Features

  • Python actor can be used to create domain- or content-specific scripts for accessing, merging and manipulating data.
  • R and Matlab actors can be used to easily perform complex statistical analyses.
  • A WebSer­vice actor can be used to access and execute WSDL-defined Web services from within a workflow.
  • An ExternalExecution actor can be used to execute command line applications from within a workflow.
  • Actors can be grouped into subworkflows and saved as CompositeActors for transportablility and re-use.
  • A number of grid technology actors provide access to web-accessible data repositories and support for parallel processing.
  • Kepler workflows can be exchanged in XML using the Modeling Markup Language (MoML) described at http://ptolemy.eecs.berkeley.edu/publications/papers/00/moml/moml_erl_memo.pdf .

Issues

Python scripts run in a Jython interpreter within the same JVM as Kepler. This interpreter is only instantiated once so it does not need to be reloaded by each Python actor. On the downside, a global variable set in one actor can cause unexpected side effects if it is reset in another actor. We discovered that if a function is defined as a global (i.e. outside of a class), the method name cannot be reused. The Jython interpreter will always persist the code from the first instantiation of the method. R and Matlab actors require that those applications be installed outside of Kepler. When an associated script is run, the code and data are passed to an external process running R or Matlab.  The actor waits for that process to finish before proceeding. Thus workflows with multiple R/Matlab steps will incur the corresponding application startup overhead for each instance of the actor. This can significantly affect workflow performance. Kepler is a very good tool for building scientific workflows for those who are familiar with software development. However, it is not particularly friendly to those who are used to .  (Google Code project hydrant-kepler discussion of this issue)

...