You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Title (goal)
High Volume of Concurrent Ingests
Primary ActorSubmitter
Scope 
Level 
Story

We need to be able to reliably load submission packages on a large scale from local network drives. We ingest many thousands of objects as part of a single batch submission. We would like to be able to ingest objects in the batch in parallel for higher throughput.

The batches will not be part of a single Fedora transaction with rollback. (Our tools will sometimes pause, re-prioritize and then resume a batch ingest job.)

Our largest anticipated collection next year is a 10TB collection. That would come in the form of perhaps 10 submission packages of 1TB each, containing files numbering in the tens of thousands.

We'd like to be able to scale out our cluster to handle such a large collection without disrupting our normal submission streams. We estimate that base ingest throughput would be another 100k items at 10TB.

That leaves an approximate throughput of 200k items totaling 20TB per month. In this case each item is an object with about 5 datastreams.

To support ongoing collection work an individual batch ingest (10k items totaling 1TB) should take no longer than 2 days to ingest, given sufficient i/o and cluster resources.

  • No labels