Dataset Selection
Initial DataSets were defined by Maria in Jira https://fedora-repository.atlassian.net/jira/software/c/projects/TLFR/issues/TLFR-1
Sample Ingest Scripts were generated by Dan to mimic TACC's prior abandoned ingests on Fedora 5 which failed due to issues of scale https://fedora-repository.atlassian.net/jira/software/c/projects/TLFR/issues/TLFR-3
Initial Benchmark
Introduction
PRJ-2972 was selected as an initial benchmark test dataset. Ingest was performed with the following parameters and the results are recorded here to serve as a benchmark for further testing.
Test Parameters
Software Stack
- Fedora 6.4
- Nginx reverse proxy
- Tomcat Container Server
- OpenJDK 11
- Recursive Dumb Ingester https://github.com/DesignSafe-CI/fedora-benchmarks
Ingest was performed locally with Tomcat and Postgres running within containers on the same host. Fedora local storage and ingest data were Bothe on mounted storage (2TB) /dev/vdb1 formatted as XFS.
VM Specification
- Hosted on Cyclone
- 8 Cores
- 32GB RAM
- 2TB additional storage (fedora and ingest source)
Run 1:
428458 resources created in 751 minutes (~12 hours). resulting dcfl-root 277GB from 266GB input (PRJ-2972)