Initial User Stories

Spreadsheet organizes tasks, by priorty, to indicate deliverables for December, 2012.

Definitions

Operational Storage: Storage used for direct, active read/write access by research processes, the researcher and authorized collaborators (if any).
Secure (Secondary/Reliable/??) Storage: Storage for secure copies (snapshots) of active research materials.
Archival Storage: Storage of inactive research materials for archival purposes. Mostly read-only access (fixed content except for additions made by preservation and other archival processes). Access and disposition is based on policies provided by the researcher, institution, steward and curator. This may include access for subsequent active research processes (re-use) if permitted by policy.
File: A commonly used generic term for a digital storage unit. Most commonly a file in a computer's file system accessed through a directory path. Note that this term is very context sensitive in its use. We need to be clear about RDB aka databases. DWD
Collection: A group of digital storage units and/or other collections. The collection provides a means to record the structure of the units of storage and later for recording additional relationships for integrity and archival purposes.
Data Set: There is no commonly accepted definition but, notionally, this is a grouping of data records, often observations, that constitutes a well bounded unit of research data. Common process on data sets include:
- creating derived data sets (generally using some form of transformation)
- subsetting (an extraction of a portion of a data set) We need to be clearer, a subset of materials or a subset of a data set.
- feature identification (looking for artifacts of interest within a data set)
- correlation (creating linkage inside or between data sets)
Related Documentation: Any information related to the research that is not classified as a data set but is part of the research process including presentation or publication of it.
Research Materials: The whole of the documentation regarding a unit of research including both data sets and related documentation. May be born digital or be surrogates of physical items.
Unit of Research: An active body of work as defined by the researcher often related to a project or unit of funding. Note that the older research materials may remain in the possession of the researcher but has become fixed.
Data (Information) Life Cycle: The phases of the research materials life cycle from authoring (creation) through archival and disposition. For simplification in this project we may consider the following:
- active - New materials are still being added to the unit of research
- snapshot - A snapshot copy of active data has been made into secure storage
- inactive - New materials are no longer being added to the unit of research ( research data is fixed/static, archivist may be adding descriptive info)
- (Need a description of the transitions between these states)

Product Owners

Researcher / PI (Monthly User)
1. Research Assistant (Daily User) [DSPINT:an actor who performs tasks for the Researcher]
Curator (Quarterly User) Consider this actor as both an archive admin and data manager?
DTR Admin (Once, except for unfortunate midnight alerts): I am treating this person as a system/service/IT admin?

User Stories

As a Research Assistant (Daily User) I would like to..

establish a set of credentials (profiile) with the service associated with an account
login
set up automated, periodic (daily?/hourly?) synchronization service for research materials (data sets and related documents) in digital form
1. between one or more operational research stores
2. to one or more "secure" snapshot stores
3. with little setup or maintenance effort on my part (the service “Watches the Directory” for me)
determine that the system is operating correctly via a regular notification or other access to the synchronization status (know what the jobs are and if they happened)
make new collections from parts of existing collections
search and extract subsets of data from existing data (from local?)
- for example, pull tabular data into excel or pull observation dates into a web page
locate past data by date, source device, name, person. Why just "past" DWD
install / develop / utilize interfaces from instruments or software to capture data, and the operational settings in use. N.B. look at existing workflows from Taverna and Kepler, etc.
have the encryption/decryption of data handled completely by DTR
restore my data to my operational storage quickly upon request
capture information about my data including the original storage structure but also additional data such as who, when and where it was created or modified
keep versions of the data when chosen as an option also with who, when and where is was modified
to easily create a template for the directory layout of my storage indicating the kinds of data kept in each directory
have the system keep information about the data that is implied or stated about the directory structure
permit description to be added to any kind of data or storage structure manually or capturing parts of my operational research infrastructure
1. be able to augment metadata via a web interface
2. permit my collaborators or myself to add notes to the data or structure
provide controlled access to my collaborators including versions or modifications
designate when and under what terms operational data is moved to an archival status
easily run services of my choice over the data to aid in search, correlation, subsetting or analysis
easily share DuraCloud services with my collaborators under access control specified by researcher.
visualize my data in ways that are useful and interesting to me (including relationships among data)
1. Install visualizers to support this.
use tools that will transform my data
export subsets of my data to a VRE
integrate my data with a VRE
integrate my data with my workflow system (inputs and outputs)
integrate with my Electronic Lab Notebook
bit integrity assurance It may be a bit more that that in that the integrity of the structure of the materials as a whole is significant. DWD## be assured my data has not been corrupted (authentic)
1. compare my operational data storage with my secure storage to determine if there are any discrepancies between the two
2. search to find items of of interest to me either for ensuring that my data is intact or to locate items that may be of operational interest Why is this different that the Research Assistants version. DWD

As a Researcher (PI / Data Owner) I would like to...

Perform any Research Assistant operations
Perform any Admin functions
1. Discussion: fill in a template about the data organization that guides the curator
get reports for usage and costs regarding the system. We will need various structural breakdowns (researcher, department, institution, etc)
be assured my data is securely stored so there is little or no risk of loss
Control or delegate encryption and policy
1. specify which subsets of content are to be encrypted
2. provide my own encryption key for content transferred to DTR
3. have the encryption/decryption of data integrated with a third-party keystore
provide public or curatorial copies of services
enable access to selected portions of my data for publication
provide stable citation to selected portions of my data for access
be notified if anyone operates on the data (e.g. access and modify)
be provided with a customized boilerplate detailing my DMP
specify the retention policy for my data in the creation of the DMP boilerplate

As a Curator I would like to...

Perform any Admin Functions on data (or snapshot) where ownership has transferred.
login
specify the policies needed to manage collection, access, stewardship, services and disposition in as flexible and automated fashion as is feasible
receive reports regarding the integrity of the materials and the costs for operating the system with respect to the unit of disposition
establish rights, transfer and disposition agreements with researchers, institutions and funders
collect a snapshot of research materials, largely automatically, by the time a unit of research becomes inactive
collect provenance information, largely automatically, about the researcher’s data in accordance with policy
collect context information, largely automatically, about the researcher’s data in accordance with policy
collect algorithms, software, methodology and workflows, largely automatically, about the researcher’s process to enable reproducible science
be able to capture sufficient information to make the science reproducible by others at a high level of service
I am not clear about the curator's role early in the ILS. It crucial but we have not distinguished it.
Create the policies and standards for materials collection
Interact with persons creating PMPs, possible provide PMP services.
Create the templates/schemas.

As an Admin I would like to...

login
initialize the service
manage accounts (profiles) for other kind of users
review and adjust service policy settings
Delegate controlled access to functions to persons of my choosing. (Research Assistants, Collaborators, Curators) Delegate actor is Research Assistant, are there others?
set data lifetimes Isn't that the curator's task DWD
install and manage public data schemas Isn't that the curator's task DWD
install and manage visualizers
install and manage DMP templates Isn't that the curator's task DWD

As a Sysop / IT I would like to ...

Perform Admin operations
Allocate accounts to researchers / administrators
integrate the service with my institution identity management system
integrate the service with external identity management systems as recognized by the research and/or the institution

Definitions: As (someone above), I would like to...

Login
1. log into DTR using DuraCloud-based Identity management
2. log into DTR using my Institution’s Identity management
3. log into DTR using my institution’s IdM via InCommon (Shibboleth)
4. log into DTR using a public Identity management (OAuth, OpenID)
Search
1. Search for a unit of storage (list of files)
2. Search within the units of storage (contents of files)

Non User Story Functional Requirements

Capture provenance and audit logs on each (data element)
Stream audit logs as events
Provide encryption options on all data transfers, and all stored data
Comprehend common public data schemas; recognize and categorize incoming data Isn't that the curator's task DWD
Integrate with third party key management system(s)
Implement a (standards based) service bus with internal and external APIs
1. Including workflow capture interfaces (see jBPM, Taverna, Kepler, …)
Provide system defined services for automatic data transformation
Provide user defined services for automatic data transformation
Control access based on policy and user accounts
Allow for data policies including versioning and immutable options

Additional Ideas

How do we encourage the researcher to learn about and utilize appropriate data schemas?
How do we capture / snapshot operational environments for later restoration? Should we?
Does the grant take us to demonstration or production?

Next Steps

Mark each of these requirements as (Critical, NiceToHave, NotNow)
Sort them by development priority and expand them
Define the minimal required system image

Child pages

Initial User Stories

Spreadsheet organizes tasks, by priorty, to indicate deliverables for December, 2012.

Definitions

Product Owners

User Stories

As a Research Assistant (Daily User) I would like to..

As a Researcher (PI / Data Owner) I would like to...

As a Curator I would like to...

As an Admin I would like to...

As a Sysop / IT I would like to ...

Definitions: As (someone above), I would like to...

Non User Story Functional Requirements

Additional Ideas

Next Steps