Performance Testing Scenarios

Performance testing may be simplest if real-world scenarios are used that are drawn from the use patterns for Fedora 4. In other words, performance testing should be informed by the expected way Fedora will be used. No single software product can perform well for every possible use (a.k.a pattern of use). We need to define the expected uses for Fedora, and acknowledge the trade-offs that are being made. While we need to recognize that there will be unanticipated uses, we can only test for the cases that best characterize how we expect Fedora to be used. This sets expectations for those building systems using Fedora and guidance for Fedora committers. When there is a use is identified during system design, developers can decide if Fedora is suitable as a part of their implementation. Fedora committers can decide either Fedora can be extended to support that use or suggest how Fedora can be used in combination with other tools to support that use.

The scenarios are broken into major categories of use that give us examples of how Fedora is used to aid in constructing a realistic performance test suite. These are intended to be pragmatic categories since they present different loads on the system and should not be mechanically linked one-to-one with any API.

Definitions

Use - Canonical action in a use case from the user's view
User Operation - Users view of a logical, single operation
Repository Operation - Repositories view of a logical, single operation for our purposes as the result of one or more API calls
Performance - The number of units of work that are accomplished during an operation
Interleaving Concurrency - The number and kinds of operations being performed at the same nominal time (as opposed to a single repetition of an operation)
Unit of Work - A metric Measurement - Metrics (to be defined appropriate to ) that are appropriate for measuring the results of a performance test of the Fedora Repository.
- There may be more than one metric
- Working definitionsideas:
  - The amount in bytes of content and metadata for an operation, or per unit of time
  - The number of operations per unit of time
  - The time between when an operation is started and when it is completed
  - Count of operations performed possible eliminating content transfer time (normalized)

Warning

title	To Be Done

Are these a good list of usage categories?
Validate/Add user operations for each category (examples). Mark them as now, or future, or never? Note, if an external tool is needed. - This tell to the likely limits we can do). Prioritize short term goals. - This tells us how we expect Fedora 4 is to be used.
Add repository user operations to each category? Map user tests
Add repository operations to teststest
Choose performance units for each test
Choose performance expectations for each test
Construct interleaving matrix (to the limit of what is practical to accomplish) Describe the test
Construct concurrency goals for each test - Note: single thread tests will be use as a baseline as informed by Single Node tests
Write a test script (for each test)
Describe the Fedora 4 configuration
Set up a test infrastructure using The Grinder
Prepare data
Test

Usage Categories

The following categories and uses are drawn from the Fedora 4 Roadmap and any new items especially expectations that have surfaced during development. Only performance related uses are included, specific function details are dropped for simplicity and that they are likely not need specific performance testing.

Authoring

Authoring is the activity of creating or assembling new content. This includes both constructing wholly new content and referencing existing content. It is different from ingest in that is characterized by incremental assembly of the content which many rapid write/read cycles to accomplish what the user considers a single operation. It is often performed as part of some sort of authoring workflow commonly with multiple actors performing both overlapping and different roles. During this phase the content (and metadata) is rapidly changing. The time between operations in a workflow can vary considerably so Authoring needs to handle the shortest period between user operations.

User Operations

...

Repository Operations

TBD - Map to API

Simple Ingest

Simple Ingest consist of upload of single or small amounts of content and metadata. It can be accomplished with a single atomic operation or a short series of operations, usually RESTful, without required intermediate reads prior to completion. The duration of the small ingest is expected to be approximately the time it takes to upload the content and metadata starting with the beginning of the connection, where the connection is terminated after the operation. It is expected that the ability to read (access) the uploaded content and metadata should happen fairly soon after the upload is complete.

User Operations

I want upload an image through Hydra
I want to upload a paper in NIST
I am using a sync tools to upload a slow flow of new items

...

TBD - Map to API

Bulk Ingest

In bulk ingest, a large quantity of content and metadata is ingested as a logical unit or is continuous. This may be accomplished using number of repository operations or may utilize methods that are optimized for bulk ingest. It is characterized by the expectation that there may be a defined delay between when the ingest is started and part or all of the content and metadata becomes available for read.

User Operations

I am the Bodleian Library and I want to create a duplicate (backup) of my digitised texts
I am SIdora and I want to ingest the gene sequence for a Manakin (bird) coming from my in house gene sequencers
I am using a sync tool to upload a new collection via I2
I want to upload a 2000 graphs each consisting of 10000 items, and I want to be sure that each graph is complete, and the whole set is complete

...

TBD - Map to API

Simple Access

Simple access (a.k.a simple read or download) is the download of content and metadata (a.k.a representation of a resource) as a single user operation and one or a small number of repository operations. It usually RESTful, and usually contained a single request. Simple Access must not require any interleaved writes concurrent writes to accomplish the single user operation. The content and metadata stays fixed from the beginning to the end of the access.

User Operations

I want to use the Exhibition module in Islandora to present a static website
I want to present a dynamic website through Hydra
I am using a sync tool to download a slow flow of new items

...

TBD - Map to API

Conditioned Access

When streaming media, dropouts present a significant problem. The user expects to be able to access the contents without interruption. This may require a front end tool for buffering so the stream need not be perfect but good enough for the buffering tool.

User Operations

I am using Hydra to show a class lecture

Repository Operations

...

Mediated Access

Not all of the content is managed by Fedora but some resources are is provided by reference from a remote web service. Fedora would retrieve the representation (content and metadata) from the web service and present it as if it was a resource in Fedora.

...

I am using the Data Conservancy Service but I want so show Glacier images kept by the NSIDC
I have papers stored in Islandora but I want to get the supporting datasets from SIdora

Repository Operations

TBD - Map to API

Bulk Access

Download of large amount of content as single user operation. This may require any number of repository operations to accomplish. Whether content and metadata stays fixed from the beginning to the end of the operation is to be defined. This is needs consideration a whole intellectual entity, graph or DIP is considered the unit. Also we need to consider what this means for continuous access operations.

User Operations

I am the Bodleian library and an EMP device went off. I need to use a sync tool to download a major set of digitized texts
I am SIdora and I need to send a whole set of genome fragments to be assembled a ORNL
I am Hydra and I need to send a SIP to APTrust and DPN

Repository Operations

TBD - Map to API

Tests

Baseline and Simple Concurrency Testing Matrix

The tests in the table will start with a single load injector and worker to use as a baseline. Then each of the tests are executed to test concurrency with increasing numbers of load injectors and workers until performance declines and/or error rates become large.

Category	User Operation	Test Metric	Priority	Notes
Simple Ingest	Small Files	Rate	1	Increase Load Injectors until max rate is found. Synthetic data is acceptable for this test
Simple Ingest	Medium Files	Rate	1	Increase Load Injectors until max rate is found. Synthetic data is acceptable for this test
Simple Ingest	Large Files	Rate	1+	Increase Load Injectors until max rate is found. Synthetic data is acceptable for this test
Simple Ingest	Media Files
Simple Ingest	Large File Count	Rate - Normalized Count	1+	Ingest files to a substantial number to explore maximum file count. Normalized to ignore size of a given file.
Bulk Ingest	Small Files
Bulk Ingest	Medium Files
Bulk Ingest	Large Files
Bulk Ingest	Large File Count	Rate - Normalized Count		Ingest files to a substantial number to explore maximum file count
Simple Access	Small Files	Random Access	1	Increase Load Injectors until max rate is found. Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple Access	Medium Files	Random Access	1	Increase Load Injectors until max rate is found. Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple Access	Large Files	Random Access	1	Increase Load Injectors until max rate is found. Site should contain a set of files of uniform size. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple Access	Mixed Files	Random Access	1+	Increase Load Injectors until max rate is found. Site should contain a set of files of all three sizes. Tests can vary download mix by URL. It is essential that caching be avoided. Synthetic data is acceptable for this test.
Simple Access	Large File Count	Random Access	1	Increase Load Injectors until max rate is found. Access a large number of files until time is exhausted.
Bulk Ingest	Small Files
Bulk Ingest	Medium Files
Bulk Ingest	Large Files
Bulk Ingest	Mixed Files

Concurrency Tests

The tests in the table combine two or more different tests from the table above. It starts with concurrent operation of at least two workers, one for each simple test. Then each of the tests are executed with increasing numbers of load injectors and workers until performance declines and/or error rates become large.

Category	User Operation	Repository Operation	Test Metric	Priority	Notes
Concurrent Test #1	Simple Access Mixed Files Simple Ingest Mixed Files		Rate Random Access	1+	Number of Load Injectors TBD Number of Load Injectors TBD
Uses	APIs	Test Metric	Notes

Concurrent Test #N	Authoring Simulation		Rate Directed Access

Testing Tools and Configuration

The suggested testing tool is The Grinder. It is a load testing framework written in Java, that uses Java, Jython and/or Clojure for writing the tests. A more complete discussion of our testing infrastructure may be found on this page.

Testing Considerations

These scenarios expand on the previous single stimulus load injector tests to use multiple read, write, and read-write tests via the REST api.

If the group think this is useful then I can break it into a matrix.

Note: all tests need to be taken until:

a steady state is achieved
a declining state is achieved
Fedora 4 no longer responds

Multiple Read

Stimulators

exhibits high error rates or stops responding

Load Injectors

1 Load Injector to 1 stimulator to provide a baseline similar to the previous testing regimen
3 stimulators3 Load Injectors
6 stimulators Load Injectors (since this is where Fedora 3 starts to exhibit limits)
12 stimulators Load Injectors (since this is where Fedora 3 always exhibits limits)24 stimulators
24 Load Injectors

Payloads

1K file
1M file
50M file (Avg Video100K file (Minimal File, Also used in the single node baseline tests)
2.7G file (DVD)

Rates

Step up rates X2 until flat line
Then proceed to declining response and failure or non-response

Multiple Write

Stimulators

1 stimulator to provide a baseline similar to the previous testing regimen
3 stimulators
6 stimulators (since this is where Fedora 3 starts to exhibit limits)
12 stimulators (since this is where Fedora 3 always exhibits limits)
24 stimulators

Payloads

1K file
1M file5 M file (Avg Hi-Res Image)
50M file (Avg Video, Also used in the single node baseline tests)
2.7G 6G file (DVD)

Rates

Step up rates X2 until flat line
Then procede proceed to declining response and failure , high error rates or non-response

Read-Write

Stimulators

1 stimulator to provide a baseline similar to the previous testing regimen
3 stimulators
6 stimulators (since this is where Fedora 3 starts to exhibit limits)
12 stimulators (since this is where Fedora 3 always exhibits limits)
24 stimulators

Payloads

This needs to be matrixed. The payloads should be mixed but not randomly to make the tests repeatable.

1K file
1M file
50M file (Avg Video)
2.7G file (DVD)

Rates

Step up rates X2 until flat line
Then proceed to declining performance and failure or non-response

...

Note: Single node test indicated a sensitivity to have large numbers of items as children of a single node. How should we deal with this?

Fedora Configurations

Not Transactional
Transactional
Not Clustered
Clustered
Not Replicated
Replicated

Page tree

Versions Compared

Old Version 13

New Version Current

Key

Table of Contents

Performance Testing Scenarios

Definitions

Usage Categories

Authoring

User Operations

Repository Operations

Simple Ingest

User Operations

Bulk Ingest

User Operations

Simple Access

User Operations

Conditioned Access

User Operations

Repository Operations

Mediated Access

Repository Operations

Bulk Access

User Operations

Repository Operations

Tests

Baseline and Simple Concurrency Testing Matrix

Concurrency Tests

Testing Tools and Configuration

Testing Considerations

Fedora Configurations

Resources

Page tree

Page History

Versions Compared

Old Version 13

New Version Current

Key

Table of Contents

Performance Testing Scenarios

Definitions

Usage Categories

Authoring

User Operations

Repository Operations

Simple Ingest

User Operations

Bulk Ingest

User Operations

Simple Access

User Operations

Conditioned Access

User Operations

Repository Operations

Mediated Access

Repository Operations

Bulk Access

User Operations

Repository Operations

Tests

Baseline and Simple Concurrency Testing Matrix

Concurrency Tests

Testing Tools and Configuration

Testing Considerations

Fedora Configurations

Resources