Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Storage environment:  for the purposes of this test (and for our real migration), we are migrating from one CIFS-mounted remote filesystem to another CIFS-mounted remote filesystem.
  2. Speed: The tool migrates approximately 1700 objects/hr.  At this rate, it will take approximately 10 days to migrate the entire repository,
  3. Datastream index:  takes about 67 1h10m minutes to build, and occupies 327MB of disk space.
  4. CPU time. Consumes about 15%.
  5. Source layout.  Akubra hash storage, using the pattern "#/##/##" for both datastreams and objects.OCFL storage:  Pairtree.  It will be good when the OCFL storage profile specification is set and incorporated into migration-utils, so that we can define the OCFL layout, similar to how we can specify the Akubra filesystem layout
  6. Average seconds per object is calculated based on the difference between the time the first object is processed (after the datastream index has been generated) and the time the last object is processed.

Issues

Migration Tests

UW Digital Collections Center Production Repository

Fedora 3: Approx. 390561,000 objects (382GB559GB): mostly books, pages and still images, with some audio, video, and PDF resources.  Approximately 2.33M 36 million datastreams (610.3TB). Content objects have one binary datastream and 5 XML metadata datatstreams.  Container objects have ~5 XML metadata datastreams.  All datastreams are either inline or managed (no external or redirect datastreams).

Fedora 3.8.1.  Migration run on desktop workstation VM with 8 4 cores, 16 8 GB RAM.  CentOS Linux release 78.72.1908 2004 (Core), Intel(R) CoreXeon(TM) i7-6700 CPU @ 3.40GHzR) Gold 5220 CPU @2.20GHz

Command run:

Code Block
languagebash
titleUW Madison migration-util command line
$ java -jar target/migration-utils-4.4.1-SNAPSHOT-driver.jar --migration-type=FEDORA_OCFL --source-type=akubra --datastreams-dir=/fedora3-prod/fedora/datastreams --objects-dir=/fedora3-prod/fedora/objects --target-dir=/fedora-migration-test --layout=pairtree --index-dir=/var/tmp/datastream-index


Number
of objects

Execution
Time

Source

Layout

Dest.

Average seconds per objectOCFL repository size
Layout

Migration
tool version

Notes
1000Datastream index
1 hr 7 minutesAkubrapairtree02/02/20 (cd7ece7)233MB100037 minAkubrapairtree02/02/20 (cd7ece7)1K fedora items produced 33.5K+ files100,00059 hours 58 minAkubrapairtree02/02/20 (cd7ece7)Stopped and restarted;  time excludes read time for datastream index on startup (29 minutes)All 390,000X hoursAkubrapairtree
: 1h17m
OCFL repo: 4h36m
2.9 sec133GB


(81586bf )

with param --pid-file=1000pids.txt
datastream index cleared after run
10,000Datastream index: 1h5m
OCFL repo: 11h48m
4.3 sec147GB
(81586bf )


with param --pid-file=10000pids.txt
datastream index cleared after run

Most objects are XML docs in this batch.

100,000Datastream index: 1h9m
OCFL repo: 3d20h16m
3.3 sec1.6TB

 
(4a9f19c)

with param --pid-file=100000pids.txt
datastream index cleared after run
All 561,000

Datastream index: 1h10m
OCFL repo:
20d21h12m

3.2 sec9TB

 
(43b7bae)

all pids
02/02/20 (cd7ece7)