Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Authoring is the activity of creating or assembling new content. This includes both constructing wholly new content and referencing existing content.  It is different from ingest in that is characterized by incremental assembly of the content which many rapid write/read cycles to accomplish what the user considers a single operation.  It is often performed as part of some sort of authoring workflow commonly with multiple actors performing both overlapping and different roles.  During this phase the content (and metadata) is rapidly changing.  The time between operations in a workflow can vary considerably so Authoring needs to handle the shortest period between user operations.

User Operations

  • In Islandora, I am using forms to create one or more items, and I am editing relationships incrementally
  • In the National Science Digital Library (defunct), I have a constant stream of third party annotations being added
  • In RepoMMan, I need the send a paper through an approval process where the approvers may want to make editorial changes
  • In Hydra (I don't have one but I bet its there)

Simple Ingest

Simple Ingest consist of upload of single or small amounts of content and metadata. It can be accomplished with a single atomic operation or a short series of operations, usually RESTful, without required intermediate reads prior to completion.  The duration of the small ingest is expected to be approximately the time it takes to upload the content and metadata starting with the beginning of the connection, where the connection is terminated after the operation.  It is expected that the ability to read (access) the uploaded content and metadata should happen fairly soon after the upload is complete.

User Operations

  • I want upload an image through Hydra
  • I want to upload a paper in NIST
  • I am using a sync tools to upload a slow flow of new items

Bulk Ingest

In bulk ingest, a large quantity of content and metadata is ingested as a logical unit or is continuous.  This may be accomplished using number of repository operations or may utilize methods that are optimized for bulk ingest.  It is characterized by the expectation that there may be a defined delay between when the ingest is started and part or all of the content and metadata becomes available for read.

User Operations

...

...

Simple Access

Simple access (a.k.a simple read or download) is the download of content and metadata (a.k.a representation of a resource) as a single user operation and one or a small number of repository operations.  It usually RESTful, and usually contained a single request. Simple Access must not require any concurrent writes to accomplish the single user operation.  The content and metadata stays fixed from the beginning to the end of the access.

User Operations

  • I want to use the Exhibition module in Islandora to present a static website
  • I want to present a dynamic website through Hydra
  • I am using a sync tool to download a slow flow of new items

Conditioned Access

When streaming media, dropouts present a significant problem.  The user expects to be able to access the contents without interruption.  This may require a front end tool for buffering so the stream need not be perfect but good enough for the buffering tool.

...

  • I am using Hydra to show a class lecture

Mediated Access

Not all of the content is managed by Fedora but some resources are is provided by reference from a remote web service.  Fedora would retrieve the representation (content and metadata) from the web service and present it as if it was a resource in Fedora.

User Operations

...

...

Bulk Access

Download of large amount of content as single user operation.  This may require any number of repository operations to accomplish. Whether content and metadata stays fixed from the beginning to the end of the operation is to be defined. This is needs consideration a whole intellectual entity, graph or DIP is considered the unit.  Also we need to consider what this means for continuous access operations.

...

  • I am the Bodleian library and an EMP device went off.  I need to use a sync tool to download a major set of digitized texts
  • I am SIdora and I need to send a whole set of genome fragments to be assembled a ORNL
  • I am Hydra and I need to send a SIP to APTrust and DPN

Preliminary Testing Matrix

Large - MixedDuration - Mixed  - Mixed 

Category

User OperationRepository OperationTest MetricTestPriorityNotes
AuthoringIslandora AuthoringUse concurrency test (see below)     
Authoring

Authoring with Workflow

Not planned for initial test

     
Simple IngestSmall Files Rate 1Synthetic data is acceptable for this test
Simple IngestMedium Files Rate 1Synthetic data is acceptable for this test
Simple IngestLarge Files - Media  Rate  1Synthetic data is acceptable for this test 
Simple IngestLarge

Mixed Files

 

Rate

  1Synthetic data is acceptable for this test
Simple IngestLarge Media Files - Media Rate    
Simple IngestLarge File Count 

Rate - Normalized

Count

  Ingest files to a substantial number to explore maximum file count. Normalized to ignore size of a given file.
Bulk IngestSmall Files     
Bulk IngestMedium Files     
Bulk IngestLarge Files     
Bulk IngestLarge File Count 

Rate - Normalized

Count

  Ingest files to a substantial number to explore maximum file count
Simple AccessSmall Files Random Access 1Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple AccessMedium Files Random Access 1Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple AccessLarge Files Random Access 1Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple AccessMixed Files Random Access 1Site should contain a set of files of all three sizes. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple AccessLarge File Count Random Access 1Access a large number of files until time is exhausted.
Bulk IngestSmall Files     
Bulk IngestMedium Files Static Web Site Random Access    
Bulk IngestLarge Files     
Bulk IngestMixed Files     
Conditioned AccessNone planned for initial testing     
Mediated AccessNone planned for initial testing     
Concurrent Test #1

Simple Access Mixed Files

Simple Ingest Mixed Files

 

Rate

Random Access

 1

Number of Load Injectors TBD

Number of Load Injectors TBD

Concurrent Test #1Authoring Simulation 

Rate

Directed Access

  Random ingests (writes), Random delay read of same file. Count errors.
       
       
       
       

 

Testing Considerations

These scenarios expand on the previous single stimulus load injector tests to use multiple read, write, and read-write tests via the REST api.

...

  • Step up rates X2 until flat line

  • Then proceed to declining performance and failure or non-response

Fedora

...

Configurations

  • Not Clustered
  • Clustered
  • Replicated
  • Not Replicated

...