Dataset Selection

Initial DataSets were defined by Maria in Jira  https://fedora-repository.atlassian.net/jira/software/c/projects/TLFR/issues/TLFR-1

Sample Ingest Scripts were generated by Dan to mimic TACC's prior abandoned  ingests on Fedora 5 which failed due to issues of scale https://fedora-repository.atlassian.net/jira/software/c/projects/TLFR/issues/TLFR-3

Initial Benchmark

Introduction

PRJ-2972 was selected as an initial benchmark test dataset. Ingest was performed with the following parameters and the results are recorded here to serve as a benchmark for further testing. 

Test Parameters

Software Stack

Ingest was performed locally with Tomcat and Postgres running within containers on the same host. Fedora local storage and ingest data were Bothe on mounted storage (2TB) /dev/vdb1 formatted as XFS.

VM Specification 

  • Hosted on Cyclone
  • 8 Cores
  • 32GB RAM
  • 2TB additional storage (fedora and ingest source)


Run 1:

428458 resources created in 751 minutes (~12 hours). resulting dcfl-root 277GB from 266GB input (PRJ-2972)



  • No labels