Files:

Instructions:

  1. DSpace now comes with a Checksum Checker script (dspace/bin/checker) which can be scheduled to verify the checksum of every item within DSpace. Since DSpace calculates and records the checksum of every file submitted to it, this script is able to determine whether or not a file has been changed (either manually or by some sort of corruption or virus). The idea being that the earlier you can identify a file has changed, the more likely you'd be able to recover it (assuming it was not a wanted change).
  2. There are several configuration options for the Checksum Checker which appear in the following section of dspace.cfg:
    #### Checksum Checker Settings ####
    The options you should most pay attention to are those regarding the checksum retention history (shown below).  These two options specify how long a single checksum verification action is kept within your DSpace database.   More information on each follows:
    # check history retention
    checker.retention.default=10y 
    checker.retention.CHECKSUM_MATCH=8w 
  3. The
    checker.retention.CHECKSUM_MATCH
    option specifies the timeframe after which a successful "match" will be removed from your DSpace database (defaults to 8 weeks). This means that after 8 weeks, all successful matches are automatically deleted from your database (in order to keep that database table from growing too large).
  4. If you changed any option in the dspace.cfg, you will need to restart Tomcat (See Quick Restart in Rebuild+DSpace) for the changes to take affect.
  5. The Checksum Checker script (dspace/bin/checker) also has several command line options to be aware of:
  6. You should schedule the Checksum Checker to run automatically, based on how frequently you backup your DSpace instance (and how long you keep those backups around for). The size of your repository is also a factor. For very large repositories, you may need to schedule it to run for an hour (e.g.
    -d 1h
    option) each evening to ensure it makes it through your entire repository within a week or so. Smaller repositories can likely get by with just running it weekly.
    #*For Linux or Mac OSX, you can schedule it by adding a
    cron
    entry similar to the following to the crontab for the user who installed DSpace:0 4 * * 0 dspace/bin/checker -d2h -p
  7. Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. There is no shell script for this functionality, but it's still a rather easy change. Just make sure to schedule it to run after the checker has completed its processing (otherwise the email may not contain all the results).