...
- Storage environment: for the purposes of this test (and for our real migration), we are migrating from one CIFS-mounted remote filesystem to another CIFS-mounted remote filesystem.
- Speed: The tool migrates approximately 1700 objects/hr. At this rate, it will take approximately 10 days to migrate the entire repository,
- Datastream index: takes about 67 1h10m minutes to build, and occupies 327MB of disk space.
- CPU time. Consumes about 15%.
- Source layout. Akubra hash storage, using the pattern "#/##/##" for both datastreams and objects.OCFL storage: Pairtree. It will be good when the OCFL storage profile specification is set and incorporated into migration-utils, so that we can define the OCFL layout, similar to how we can specify the Akubra filesystem layout.
Issues
Migration Tests
UW Digital Collections Center Production Repository
Fedora 3: Approx. 390561,000 objects (382GB559GB): mostly books, pages and still images, with some audio, video, and PDF resources. Approximately 2.33M 36 million datastreams (610.3TB). Content objects have one binary datastream and 5 XML metadata datatstreams. Container objects have ~5 XML metadata datastreams. All datastreams are either inline or managed (no external or redirect datastreams).
Fedora 3.8.1. Migration run on desktop workstation VM with 8 4 cores, 16 8 GB RAM. CentOS Linux release 78.72.1908 2004 (Core), Intel(R) CoreXeon(TM) i7-6700 CPU @ 3.40GHzR) Gold 5220 CPU @2.20GHz
Command run:
Code Block | ||||
---|---|---|---|---|
| ||||
$ java -jar target/migration-utils-4.4.1-SNAPSHOT-driver.jar --migration-type=FEDORA_OCFL --source-type=akubra --datastreams-dir=/fedora3-prod/fedora/datastreams --objects-dir=/fedora3-prod/fedora/objects --target-dir=/fedora-migration-test --layout=pairtree --index-dir=/var/tmp/datastream-index |
Number | Execution |
---|
Source
Layout
Average seconds per object | OCFL repository size | Source | Migration | Notes |
---|---|---|---|---|
1000 | Datastream index |
: 1h17m OCFL repo: 4h36m | 16.3 sec | 184GB | Akubra |
| with param --pid-file=1000pids.txt datastream index cleared after run | |||||
10,000 | Datastream index: 1h5m OCFL repo: 11h48m | 4.3 sec | 688GB | Akubra | (81586bf ) | with param --pid-file=10000pids.txt |
100,000 | Datastream index: 1h9m OCFL repo: 3d20h16m | 3.3 sec | 6.8TB | Akubra | | with param --pid-file=100000pids.txt datastream index cleared after run |
All 561,000 | Datastream index: 1h10m | 3.2 sec | 39TB | Akubra | | all pids |