Spreadsheet organizes tasks, by priorty, to indicate deliverables for December, 2012.

Definitions

  • Operational Storage: Storage used for direct, active read/write access by research processes, the researcher and authorized collaborators (if any).
  • Secure (Secondary/Reliable/??) Storage: Storage for secure copies (snapshots) of active research materials.
  • Archival Storage: Storage of inactive research materials for archival purposes. Mostly read-only access (fixed content except for additions made by preservation and other archival processes). Access and disposition is based on policies provided by the researcher, institution, steward and curator. This may include access for subsequent active research processes (re-use) if permitted by policy.
  • File: A commonly used generic term for a digital storage unit. Most commonly a file in a computer's file system accessed through a directory path. Note that this term is very context sensitive in its use. We need to be clear about RDB aka databases. DWD
  • Collection: A group of digital storage units and/or other collections. The collection provides a means to record the structure of the units of storage and later for recording additional relationships for integrity and archival purposes.
  • Data Set: There is no commonly accepted definition but, notionally, this is a grouping of data records, often observations, that constitutes a well bounded unit of research data. Common process on data sets include:
    • creating derived data sets (generally using some form of transformation)
    • subsetting (an extraction of a portion of a data set) We need to be clearer, a subset of materials or a subset of a data set.
    • feature identification (looking for artifacts of interest within a data set)
    • correlation (creating linkage inside or between data sets)
  • Related Documentation: Any information related to the research that is not classified as a data set but is part of the research process including presentation or publication of it.
  • Research Materials: The whole of the documentation regarding a unit of research including both data sets and related documentation. May be born digital or be surrogates of physical items.
  • Unit of Research: An active body of work as defined by the researcher often related to a project or unit of funding. Note that the older research materials may remain in the possession of the researcher but has become fixed.
  • Data (Information) Life Cycle: The phases of the research materials life cycle from authoring (creation) through archival and disposition. For simplification in this project we may consider the following:
    • active - New materials are still being added to the unit of research
    • snapshot - A snapshot copy of active data has been made into secure storage
    • inactive - New materials are no longer being added to the unit of research ( research data is fixed/static, archivist may be adding descriptive info)
    • (Need a description of the transitions between these states)

Product Owners

  1. Researcher / PI (Monthly User)
    1. Research Assistant (Daily User) [DSPINT:an actor who performs tasks for the Researcher]
  2. Curator (Quarterly User) Consider this actor as both an archive admin and data manager?
  3. DTR Admin (Once, except for unfortunate midnight alerts): I am treating this person as a system/service/IT admin?

User Stories

As a Research Assistant (Daily User) I would like to..
  1. establish a set of credentials (profiile) with the service associated with an account
  2. login
  3. set up automated, periodic (daily?/hourly?) synchronization service for research materials (data sets and related documents) in digital form
    1. between one or more operational research stores
    2. to one or more "secure" snapshot stores
    3. with little setup or maintenance effort on my part (the service “Watches the Directory” for me)
  4. determine that the system is operating correctly via a regular notification or other access to the synchronization status (know what the jobs are and if they happened)
  5. make new collections from parts of existing collections
  6. search and extract subsets of data from existing data (from local?)
    • for example, pull tabular data into excel or pull observation dates into a web page
  7. locate past data by date, source device, name, person. Why just "past" DWD
  8. install / develop / utilize interfaces from instruments or software to capture data, and the operational settings in use. N.B. look at existing workflows from Taverna and Kepler, etc.
  9. have the encryption/decryption of data handled completely by DTR
  10. restore my data to my operational storage quickly upon request
  11. capture information about my data including the original storage structure but also additional data such as who, when and where it was created or modified
  12. keep versions of the data when chosen as an option also with who, when and where is was modified
  13. to easily create a template for the directory layout of my storage indicating the kinds of data kept in each directory
  14. have the system keep information about the data that is implied or stated about the directory structure
  15. permit description to be added to any kind of data or storage structure manually or capturing parts of my operational research infrastructure
    1. be able to augment metadata via a web interface
    2. permit my collaborators or myself to add notes to the data or structure
  16. provide controlled access to my collaborators including versions or modifications
  17. designate when and under what terms operational data is moved to an archival status
  18. easily run services of my choice over the data to aid in search, correlation, subsetting or analysis
  19. easily share DuraCloud services with my collaborators under access control specified by researcher.
  20. visualize my data in ways that are useful and interesting to me (including relationships among data)
    1. Install visualizers to support this.
  21. use tools that will transform my data
  22. export subsets of my data to a VRE
  23. integrate my data with a VRE
  24. integrate my data with my workflow system (inputs and outputs)
  25. integrate with my Electronic Lab Notebook
  26. bit integrity assurance It may be a bit more that that in that the integrity of the structure of the materials as a whole is significant. DWD## be assured my data has not been corrupted (authentic)
    1. compare my operational data storage with my secure storage to determine if there are any discrepancies between the two
    2. search to find items of of interest to me either for ensuring that my data is intact or to locate items that may be of operational interest Why is this different that the Research Assistants version. DWD
As a Researcher (PI / Data Owner) I would like to...
  1. Perform any Research Assistant operations
  2. Perform any Admin functions
    1. Discussion:  fill in a template about the data organization that guides the curator
  3. get reports for usage and costs regarding the system. We will need various structural breakdowns (researcher, department, institution, etc)
  4. be assured my data is securely stored so there is little or no risk of loss
  5. Control or delegate encryption and policy
    1. specify which subsets of content are to be encrypted
    2. provide my own encryption key for content transferred to DTR
    3. have the encryption/decryption of data integrated with a third-party keystore
  6. provide public or curatorial copies of services
  7. enable access to selected portions of my data for publication
  8. provide stable citation to selected portions of my data for access
  9. be notified if anyone operates on the data (e.g. access and modify)
  10. be provided with a customized boilerplate detailing my DMP
  11. specify the retention policy for my data in the creation of the DMP boilerplate
As a Curator I would like to...
  1. Perform any Admin Functions on data (or snapshot) where ownership has transferred.
  2. login 
  3. specify the policies needed to manage collection, access, stewardship, services and disposition in as flexible and automated fashion as is feasible
  4. receive reports regarding the integrity of the materials and the costs for operating the system with respect to the unit of disposition
  5. establish rights, transfer and disposition agreements with researchers, institutions and funders
  6. collect a snapshot of research materials, largely automatically, by the time a unit of research becomes inactive
  7. collect provenance information, largely automatically, about the researcher’s data in accordance with policy
  8. collect context information, largely automatically, about the researcher’s data in accordance with policy
  9. collect algorithms, software, methodology and workflows, largely automatically, about the researcher’s process to enable reproducible science
  10. be able to capture sufficient information to make the science reproducible by others at a high level of service
  11. I am not clear about the curator's role early in the ILS. It crucial but we have not distinguished it.
  12. Create the policies and standards for materials collection
  13. Interact with persons creating PMPs, possible provide PMP services.
  14. Create the templates/schemas.
As an Admin I would like to...
  1. login
  2. initialize the service
  3. manage accounts (profiles) for other kind of users
  4. review and adjust service policy settings
  5. Delegate controlled access to functions to persons of my choosing. (Research Assistants, Collaborators, Curators) Delegate actor is Research Assistant, are there others?
  6. set data lifetimes Isn't that the curator's task DWD
  7. install and manage public data schemas  Isn't that the curator's task DWD
  8. install and manage visualizers
  9. install and manage DMP templates  Isn't that the curator's task DWD
As a Sysop / IT I would like to ...
  1. Perform Admin operations
  2. Allocate accounts to researchers / administrators
  3. integrate the service with my institution identity management system
  4. integrate the service with external identity management systems as recognized by the research and/or the institution 
Definitions: As (someone above), I would like to...
  1. Login
    1. log into DTR using DuraCloud-based Identity management
    2. log into DTR using my Institution’s Identity management
    3. log into DTR using my institution’s IdM via InCommon (Shibboleth)
    4. log into DTR using a public Identity management (OAuth, OpenID)
  2. Search
    1. Search for a unit of storage (list of files)
    2. Search within the units of storage (contents of files)

Non User Story Functional Requirements

  1. Capture provenance and audit logs on each (data element)
  2. Stream audit logs as events
  3. Provide encryption options on all data transfers, and all stored data
  4. Comprehend common public data schemas; recognize and categorize incoming data  Isn't that the curator's task DWD
  5. Integrate with third party key management system(s)
  6. Implement a (standards based) service bus with internal and external APIs
    1. Including workflow capture interfaces (see jBPM, Taverna, Kepler, …)
  7. Provide system defined services for automatic data transformation
  8. Provide user defined services for automatic data transformation
  9. Control access based on policy and user accounts
  10. Allow for data policies including versioning and immutable options

Additional Ideas

  1. How do we encourage the researcher to learn about and utilize appropriate data schemas?
  2. How do we capture / snapshot operational environments for later restoration? Should we?
  3. Does the grant take us to demonstration or production?

Next Steps

  1. Mark each of these requirements as (Critical, NiceToHave, NotNow)
  2. Sort them by development priority and expand them
  3. Define the minimal required system image
  • No labels