Old Release

This documentation relates to an old version of DSpace, version 4.x. Looking for another version? See all documentation.

This DSpace release is end-of-life and is no longer supported.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Several DSpace features require that a script is run regularly (via cron, or similar).  Some of these features include:

These regularly scheduled tasks should be setup via either cron (for Linux/Mac OSX) or Windows Task Scheduler (for Windows).

Recommended Cron Settings

If you are on Linux or Mac OSX, you should add these cron settings under the OS account which is running DSpace (or Tomcat).  For example, login as that user and type the following to edit the user's crontab.

crontab -e

 

While every DSpace installation is unique, in order to get the most out of DSpace, we highly recommend enabling these basic cron settings (the settings are described below):

## SAMPLE CRONJOB FOR A PRODUCTION DSPACE
## You obviously may wish to tweak this for your own installation, but this should give you an idea of what you likely wish to schedule via cron.
##

#-----------------
# GLOBAL VARIABLES
#-----------------
# Full path of your local DSpace Installation (e.g. /home/dspace or /dspace or similar)
# MAKE SURE TO CHANGE THIS VALUE!!!
DSPACE = [dspace]

# Shell to use
SHELL=/bin/sh

# Add all major 'bin' directories to path
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# Set JAVA_OPTS with defaults for DSpace Cron Jobs.
# Only provides 512MB of memory by default (which should be enough for most sites). But, feel free to increase as needed to give more memory.
JAVA_OPTS="-Xmx512M -Xms512M -Dfile.encoding=UTF-8"

#--------------
# HOURLY TASKS (Recommended to be run multiple times per day, if possible)
# At a minimum these tasks should be run daily.
#--------------

# Regenerate DSpace Sitemaps every 8 hours (12AM, 8AM, 4PM). 
# SiteMaps ensure that your content is more findable in Google, Google Scholar, and other major search engines.
0 0,8,16 * * * $DSPACE/bin/dspace generate-sitemaps > /dev/null

#----------------
# DAILY TASKS (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
#----------------

# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every day
# (This is only necessary if you are running OAI-PMH. It ensures new content is available via OAI-PMH.
# It also ensures the OAI-PMH index is optimized for better performance)
0 0 * * * $DSPACE/bin/dspace oai import -o > /dev/null

# Clean and Update the Discovery indexes at midnight every day
# (This ensures that any deleted documents are cleaned from the Discovery search/browse index)
0 0 * * * $DSPACE/bin/dspace index-discovery > /dev/null

# Re-Optimize the Discovery indexes at 12:30 every day
# (This ensures that the Discovery Solr Index is re-optimized for better performance)
30 0 * * * $DSPACE/bin/dspace index-discovery -o > /dev/null

# Cleanup Web Spiders from DSpace Statistics Solr Index at 01:00 every day
# (This removes any known web spiders from your usage statistics)
0 1 * * * $DSPACE/bin/dspace stats-util -m -i -f

# Re-Optimize DSpace Statistics Solr Index at 01:30 every day
# (This ensures that the Statistics Solr Index is re-optimized for better performance)
30 1 * * * $DSPACE/bin/dspace stats-util -o

# Send out subscription e-mails at 02:00 every day
# (This sends an email to any users who have "subscribed" to a Collection, notifying them of newly added content.)
0 2 * * * $DSPACE/bin/dspace sub-daily

# Run the media filter at 03:00 every day. 
# (This task ensures that thumbnails are generated for newly add images, 
# and also ensures full text search is available for newly added PDF/Word/PPT/HTML documents)
0 3 * * * $DSPACE/bin/dspace filter-media

#----------------
# WEEKLY TASKS (Recommended to be run once per week, but can be run more or less frequently, based on your needs)
#----------------
# Run the checksum checker at 04:00 every Sunday
# (This re-verifies the checksums of all files stored in DSpace. If any files have been changed/corrupted, the checksums will differ.)
0 4 * * * $DSPACE/bin/dspace checker -l -p
# NOTE: LARGER SITES MAY WISH TO USE DIFFERENT OPTIONS. The above "-l" option tells DSpace to check *everything*. 
# If your site is very large, you may need to only check a portion of your content per week. 
# The below task would instead check all the content it can within *one hour*. The next week it would start again where it left off.
#0 4 * * 0 $DSPACE/bin/dspace checker -d 1h -p
  
# Mail the results of the checksum checker (see above) to the admin.email at 05:00 every Sunday
# (This ensures the system administrator is notified whether any checksums were found to be different.)
0 5 * * 0 [dspace]/bin/dspace checker-emailer -c

 

 

 

(TIM IS WORKING ON THIS)

  • No labels