Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: sql

...

  • Limited-count mode: [dspace]/bin/dspace checker -c To check a specific number of bitstreams. The -c option if followed by an integer, the number of bitstreams to check. Example: [dspace/bin/dspace checker -c 10 This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was -c 1
  • Duration mode: [dspace]/bin/dspace checker -d To run the Check for a specific period of time with a time argument. You may use any of the time arguments below: Example: [dspace/bin/dspace checker -d 2h(Checker will run for 2 hours)

    s

    Seconds

    m

    Minutes

    h

    Hours

    d

    Days

    w

    Weeks

    y

    Years

    The checker will keep starting new bitstream checks for the specific durations, so actual execution duration will be slightly longer than the specified duration. Bear this in mind when scheduling checks.

  • Specific Bitstream mode: [dspace]/bin/dspace checker -b Checker will only look at the internal bitstream IDs. Example: [dspace]/bin/dspace checker -b 112 113 4567 Checker will only check bitstream IDs 112, 113 and 4567.
  • Specific Handle mode: [dspace]/bin/dspace checker -a Checker will only check bitstreams within the Community, Community or the item itself. Example: [dspace]/bin/dspace checker -a 123456/999 Checker will only check this handle. If it is a Collection or Community, it will run through the entire Collection or Community.
  • Looping mode: [dspace]/bin/dspace checker -l or [dspace]/bin/dspace checker -L There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for most repository systems. Cron Jobs. For large repositories that cannot be completely checked in a couple of hours, we recommend the -d option in cron.
  • Pruning mode: [dspace]/bin/dspace checker -p The Checksum Checker will store the result of every check in the checksum_history table. By default, successful checksum matches that are eight weeks old or older will be deleted when the -p option is used. (Unsuccessful ones will be retained indefinitely). Without this option, the retention settings are ignored and the database table may grow rather large!

...

A query like the following can be used to check the results of the checker (Postgres):

Code Block
languagesql
SELECT *
FROM checksum_history
WHERE date_trunc('day', process_start_date) = CURRENT_DATE
AND result != 'CHECKSUM_MATCH'
AND result != 'BITSTREAM_MARKED_DELETED';

Example of a more detailed query:

Code Block
languagesql
SELECT 
    ch.process_start_date,
    ch.process_end_date,
    ch.result,
    ch.checksum_expected,
    ch.checksum_calculated,
    b.bitstream_id,
    bfr.short_description,
    b.store_number,
    substring(b.internal_id for 2) || '/' || substring(b.internal_id from 3 for 2) || '/' || substring(b.internal_id from 5 for 2) || '/' || b.internal_id AS bitstream_path, 
    hi.handle AS item_handle,
    hc.handle AS collection_handle
FROM checksum_history ch
JOIN bitstream b
ON ch.bitstream_id = b.bitstream_id
JOIN bitstreamformatregistry bfr
ON b.bitstream_format_id = bfr.bitstream_format_id
LEFT JOIN bundle2bitstream bb
ON b.bitstream_id = bb.bitstream_id
LEFT JOIN item2bundle ib
ON bb.bundle_id = ib.bundle_id
LEFT JOIN item i
ON ib.item_id = i.item_id
LEFT JOIN handle hi
ON i.item_id = hi.resource_id
AND hi.resource_type_id = 2
LEFT JOIN handle hc
ON i.owning_collection = hc.resource_id
AND hc.resource_type_id = 3
WHERE ch.result != 'CHECKSUM_MATCH'
AND date_trunc('day', process_start_date) = CURRENT_DATE
ORDER BY ch.check_id DESC

...