...
- Storage environment: for the purposes of this test (and for our real migration), we are migrating from one CIFS-mounted remote filesystem to another CIFS-mounted remote filesystem.
- Speed: The tool migrates approximately x objects/hr. At this rate, it will take approximately x days to migrate the entire repository,
- Datastream index: takes about x 1h10m minutes to build.CPU time. Consumes about x%, and occupies 327MB of disk space.
- Source layout. Akubra hash storage, using the pattern "#/##/##" for both datastreams and objects.OCFL storage: Pairtree. It will be good when the OCFL storage profile specification is set and incorporated into migration-utils, so that we can define the OCFL layout, similar to how we can specify the Akubra filesystem layout
- Average seconds per object is calculated based on the difference between the time the first object is processed (after the datastream index has been generated) and the time the last object is processed.
Issues
Migration Tests
UW Digital Collections Center Production Repository
Fedora 3: Approx. 561,000 objects (382GB559GB): mostly books, pages and still images, with some audio, video, and PDF resources. Approximately 2.33M 36 million datastreams (610.3TB). Content objects have one binary datastream and 5 XML metadata datatstreams. Container objects have ~5 XML metadata datastreams. All datastreams are either inline or managed (no external or redirect datastreams).
Fedora 3.8.1. Migration run on desktop workstation VM with 4 cores, 8 GB RAM. CentOS Linux release 8.2.2004 (Core), Intel(R) Xeon(R) Gold 5220 CPU @2.20GHz
...
Code Block | ||||
---|---|---|---|---|
| ||||
$ java -jar target/migration-utils-4.4.1-SNAPSHOT-driver.jar --migration-type=FEDORA_OCFL --source-type=akubra --datastreams-dir=/fedora3-prod/fedora/datastreams --objects-dir=/fedora3-prod/fedora/objects --target-dir=/fedora-migration-test --index-dir=/var/tmp/datastream-index |
Number | Execution |
---|
Source
LayoutAverage seconds per object | OCFL repository size | Migration | Notes |
---|---|---|---|
1000 |
Datastream index: 1h17m OCFL repo: 4h36m | 2.9 sec | 133GB |
| with param --pid-file=1000pids.txt datastream index cleared after run |
10,000 |
Datastream index: 1h5m OCFL repo: 11h48m | 4.3 sec | 147GB | (81586bf ) | with param --pid-file=10000pids.txt Most objects are XML docs in this batch. | |
100,000 | Datastream index: 1h9m OCFL repo: 3d20h16m | 3.3 sec | 1.6TB | | with param --pid-file=100000pids.txt datastream index cleared after run |
All 561,000 | Datastream index: 1h10m | 3.2 sec | 9TB | | all pids |