This section logs migration tests for our two different repository environments, "collections" and "citations".
Approx. 4.3M records in legacy format, mostly books, pages and still images. Objects have many datastreams, binaries are generally type E external.
Fedora 3.8.1. VM with 4 cores, 8 GB RAM. CentOS release 6.10 (Final), Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Number of objects | Execution Time | Source Layout | Dest. Layout | Migration tool version | Notes |
---|---|---|---|---|---|
1000 | 4 min | legacy | pairtree | 11/26/19 | 1K fedora items produced 42K+ files |
1000 | 3 min | legacy | truncated | 11/26/19 | 1K fedora items produced 43K+ files |
100,000 | 6.5 hours | legacy | flat | 11/26/19 | |
1 million | ~3 days | legacy | pairtree | 11/26/19 | Execution crashed twice for "unable to delete staging" file issues, resume option had no issues running |
full run (4,656,669 items) | 7 days | legacy | pairtree | 2/4/20 | No issues observed for successful full migration run. Required deployment of new filesystem with large inode limit. |
Approx. 3.8M records in akubra format, all citations with one small type M XML datastream (the citation payload).
Fedora 3.8.1. VM with 1 core, 8 GB RAM. CentOS release 6.10 (Final), Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Number of objects | Execution Time | Source Layout | Dest. Layout | Migration tool version | Notes |
---|---|---|---|---|---|
2000 | 32 min | akubra | pairtree | 11/26/19 | Includes 30 min to build the index. Hung on completion-could not delete index. |
10,000 | 42 min | akubra | flat | 11/26/19 | Includes 30 min to build the index. Hung on completion-could not delete index. |
554,695 | 13 hours | akubra | flat | 11/26/19 | Attempted to migrate 1M records. Includes 30 min to build the index. Crashed due to UnrecognizedPropertyException. |
full run (3,830,777 items) | 5 days | akubra | truncated | 2/4/20 | No issues observed for successful full migration run. |
Fedora 3: Approx. 561,000 objects (559GB): mostly books, pages and still images, with some audio, video, and PDF resources. Approximately 2.36 million datastreams (10.3TB). Content objects have one binary datastream and 5 XML metadata datatstreams. Container objects have ~5 XML metadata datastreams. All datastreams are either inline or managed (no external or redirect datastreams).
Fedora 3.8.1. Migration run on VM with 4 cores, 8 GB RAM. CentOS Linux release 8.2.2004 (Core), Intel(R) Xeon(R) Gold 5220 CPU @2.20GHz
Command run:
$ java -jar target/migration-utils-4.4.1-SNAPSHOT-driver.jar --migration-type=FEDORA_OCFL --source-type=akubra --datastreams-dir=/fedora3-prod/fedora/datastreams --objects-dir=/fedora3-prod/fedora/objects --target-dir=/fedora-migration-test --index-dir=/var/tmp/datastream-index |
Number | Execution | Average seconds per object | OCFL repository size | Migration | Notes |
---|---|---|---|---|---|
1000 | Datastream index: 1h17m OCFL repo: 4h36m | 2.9 sec | 133GB |
| with param --pid-file=1000pids.txt datastream index cleared after run |
10,000 | Datastream index: 1h5m OCFL repo: 11h48m | 4.3 sec | 147GB | (81586bf ) | with param --pid-file=10000pids.txt Most objects are XML docs in this batch. |
100,000 | Datastream index: 1h9m OCFL repo: 3d20h16m | 3.3 sec | 1.6TB | | with param --pid-file=100000pids.txt datastream index cleared after run |
All 561,000 | Datastream index: 1h10m | 3.2 sec | 9TB | | all pids |