Since some users might want to get their test version up and running as fast as possible, offered below is an unsupported outline of getting DSpace to run quickly in a Unix-based environment.
Only experienced unix admins should even attempt the following without going to the detailed Installation Instructions |
useradd -m dspace gunzip -c dspace-1.x-src-release.tar.gz | tar -xf - createuser -U postgres -d -A -P dspace createdb -U dspace -E UNICODE dspace cd [dspace-source]/dspace/config vi dspace.cfg mkdir [dspace] chown dspace [dspace] su - dspace cd [dspace-source]/dspace mvn package cd [dspace-source]/dspace/target/dspace-<version>-build.dir ant fresh_install cp -r [dspace]/webapps/* [tomcat]/webapps /etc/init.d/tomcat start [dspace]/bin/dspace create-administrator |
The list below describes the third-party components and tools you'll need to run a DSpace server. These are just guidelines. Since DSpace is built on open source, standards-based tools, there are numerous other possibilities and setups.
Also, please note that the configuration and installation guidelines relating to a particular tool below are here for convenience. You should refer to the documentation for each individual component for complete and up-to-date details. Many of the tools are updated on a frequent basis, and the guidelines below may become out of date.
DSpace now requires Oracle Java 6 or greater because of usage of new language capabilities introduced in 5 and 6 that make coding easier and cleaner.
Java can be downloaded from the following location: http://java.sun.com/javase/downloads/index.jsp
Only Oracle's Java has been tested with each release and is known to work correctly. Other flavors of Java may pose problems.
DSpace 1.7.x does not build properly when using Maven 2.0.x or Maven 3.x. This is a known issue. The quick fix is to use Maven 2.2.x. More information on this issue can be found in the following JIRA issue: DS-788. |
Maven is necessary in the first stage of the build process to assemble the installation package for your DSpace instance. It gives you the flexibility to customize DSpace using the existing Maven projects found in the [dspace-source]/dspace/modules directory or by adding in your own Maven project to build the installation package for DSpace, and apply any custom interface "overlay" changes.
Maven can be downloaded from the following location: http://maven.apache.org/download.html
You can configure a proxy to use for some or all of your HTTP requests in Maven 2.0. The username and password are only required if your proxy requires basic authentication (note that later releases may support storing your passwords in a secured keystore‚ in the mean time, please ensure your settings.xml file (usually ${user.home}/.m2/settings.xml) is secured with permissions appropriate for your operating system).
Example:
<settings> . . <proxies> <proxy> <active>true</active> <protocol>http</protocol> <host>proxy.somewhere.com</host> <port>8080</port> <username>proxyuser</username> <password>somepassword</password> <nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts> </proxy> </proxies> . . </settings> |
Apache Ant is still required for the second stage of the build process. It is used once the installation package has been constructed in [dspace-source]/dspace/target/dspace-<version>-build.dir and still uses some of the familiar ant build targets found in the 1.4.x build process.
Ant can be downloaded from the following location: http://ant.apache.org
<!-- Define a non-SSL HTTP/1.1 Connector on port 8080 --> <Connector port="8080" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/> |
With the advent of a new Apache Maven 2 based build architecture (first introduced inDSpace 1.5.x), you now have two options in how you may wish to install and manage your local installation of DSpace. If you've used DSpace 1.4.x, please recognize that the initial build procedure has changed to allow for more customization. You will find the later 'Ant based' stages of the installation procedure familiar. Maven is used to resolve the dependencies of DSpace online from the 'Maven Central Repository' server.
It is important to note that the strategies are identical in terms of the list of procedures required to complete the build process, the only difference being that the Source Release includes "more modules" that will be built given their presence in the distribution package.
Before beginning an installation, it is important to get a general understanding of the DSpace directories and the names by which they are generally referred. (Please attempt to use these below directory names when asking for help on the DSpace Mailing Lists, as it will help everyone better understand what directory you may be referring to.)
DSpace uses three separate directory trees. Although you don't need to know all the details of them in order to install DSpace, you do need to know they exist and also know how they're referred to in this document:
[dspace]
. This is the location where DSpace is installed and running off of it is the location that gets defined in the dspace.cfg
as "dspace.dir". It is where all the DSpace configuration files, command line scripts, documentation and webapps will be installed to.[dspace-source]
. This is the location where the DSpace release distribution has been unzipped into. It usually has the name of the archive that you expanded such as dspace
-<version>
-release
or dspace
-<version>
-src
-release
. It is the directory where all of your "build" commands will be run.[dspace]/webapps
by default. However, if you are using Tomcat, you may decide to copy your DSpace web applications from [dspace]/webapps/
to [tomcat]/webapps/
(with [tomcat]
being wherever you installed Tomcat‚ also known as $CATALINA_HOME
).[dspace-source]
and [dspace]
directories are always separate!This method gets you up and running with DSpace quickly and easily. It is identical in both the Default Release and Source Release distributions.
useradd -m dspace |
unzip dspace-1.7-release.zip |
gunzip -c dspace-1.7-release.tar.gz | tar -xf - |
bunzip2 dspace-1.7-release.tar.bz | tar -xf - |
dspace}}database, owned by the {{dspace
PostgreSQL user (you are still logged in at 'root'):
createuser -U postgres -d -A -P dspace createdb -U dspace -E UNICODE dspace |
mvn install:install-file -Dfile=ojdbc6.jar -DgroupId=com.oracle -DartifactId=ojdbc6 -Dversion=11.2.0.2.0 -Dpackaging=jar -DgeneratePom=true |
db.name = oracle db.url = jdbc:oracle:thin:@//host:port/dspace db.driver = oracle.jdbc.OracleDriver |
[dspace-source]/dspace/config/dspace.cfg
, in particular you'll need to set these properties:
dspace.dir
- must be set to the [dspace] (installation) directory.dspace.url
- complete URL of this server's DSpace home page.dspace.hostname
- fully-qualified domain name of web server.dspace.name
- "Proper" name of your server, e.g. "My Digital Library".db.password
- the database password you entered in the previous step.mail.server
- fully-qualified domain name of your outgoing mail server.mail.from.address
- the "From:" address to put on email sent by DSpace.feedback.recipient
- mailbox for feedback mail.mail.admin
- mailbox for DSpace site administrator.alert.recipient
- mailbox for server errors/alerts (not essential but very useful!)registration.notify
- mailbox for emails when new users register (optional)
You can interpolate the value of one configuration variable in the value of another one. For example, to set feedback.recipient to the same value as mail.admin, the line would look like:
|
[dspace]
). As root (or a user with appropriate permissions), run:
mkdir [dspace] chown dspace [dspace] |
[dspace-source]/dspace
directory:
cd [dspace-source]/dspace/ mvn package |
Without any extra arguments, the DSpace installation package is initialized for PostgreSQL. _If you want to use Oracle instead, you should build the DSpace installation package as follows:
|
[dspace]_
:
cd [dspace-source]/dspace/target/dspace-[version]-build.dir ant fresh_install |
To see a complete list of build targets, run: |
cp -R [dspace]/webapps/* [tomcat]/webapps*
(This will copy all the web applications to Tomcat).
cp -R [dspace]/webapps/jspui [tomcat]/webapps*
(This will copy only the jspui web application to Tomcat.)\<Host
> section of your [tomcat]/conf/server.xml
you could add lines similar to the following (but replace [dspace]
with your installation location:
<!-- Define the default virtual host Note: XML Schema validation will not work with Xerces 2.2. --> <Host name="localhost" appBase="[dspace]/webapps" .... |
[dspace]/bin/dspace create-administrator |
http://dspace.myu.edu:8080/jspui
http://dspace.myu.edu:8080/xmlui
http://dspace.myu.edu:8080/oai/request?verb=Identify
(Should return an XML-based response)In order to set up some communities and collections, you'll need to login as your DSpace Administrator (which you created with create-administrator
above) and access the administration UI in either the JSP or XML user interface.
The above installation steps are sufficient to set up a test server to play around with, but there are a few other steps and options you should probably consider before deploying a DSpace production site.
A couple of DSpace features require that a script is run regularly – the e-mail subscription feature that alerts users of new items being deposited, and the new 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for indexing.
To set these up, you just need to run the following command as the dspace UNIX user:
crontab -e |
Then add the following lines:
# Send out subscription e-mails at 01:00 every day 0 1 * * * [dspace]/bin/dspace sub-daily # Run the media filter at 02:00 every day 0 2 * * * [dspace]/bin/dspace filter-media # Run the checksum checker at 03:00 0 3 * * * [dspace]/bin/dspace checker -lp # Mail the results to the sysadmin at 04:00 0 4 * * * [dspace]/bin/dspace checker-emailer -c |
Naturally you should change the frequencies to suit your environment.
PostgreSQL also benefits from regular 'vacuuming', which optimizes the indexes and clears out any deleted data. Become the postgres UNIX user, run crontab -e and add (for example):
# Clean up the database nightly at 4.20am 20 4 * * * vacuumdb --analyze dspace > /dev/null 2>&1 |
In order that statistical reports are generated regularly and thus kept up to date you should set up the following cron jobs:
# Run stat analysis 0 1 * * * [dspace]/bin/dspace stat-general 0 1 * * * [dspace]/bin/dspace stat-monthly 0 2 * * * [dspace]/bin/dspace stat-report-general 0 2 * * * [dspace]/bin/dspace stat-report-monthly |
Obviously, you should choose execution times which are most useful to you, and you should ensure that the report scripts run a short while after the analysis scripts to give them time to complete (a run of around 8 months worth of logs can take around 25 seconds to complete); the resulting reports will let you know how long analysis took and you can adjust your cron times accordingly.
In order to deploy a multilingual version of DSpace you have to configure two parameters in [dspace-source]/config/dspace.cfg:
default.locale = en
webui.supported.locales = en, de
The Locales might have the form country, country_language, country_language_variant.
According to the languages you wish to support, you have to make sure, that all the i18n related files are available see the Multilingual User Interface Configuring MultiLingual Support section for the JSPUI or the Multilingual Support for XMLUI in the configuration documentation.
If your DSpace is configured to have users login with a username and password (as opposed to, say, client Web certificates), then you should consider using HTTPS. Whenever a user logs in with the Web form (e.g. dspace.myuni.edu/dspace/password-login) their DSpace password is exposed in plain text on the network. This is a very serious security risk since network traffic monitoring is very common, especially at universities. If the risk seems minor, then consider that your DSpace administrators also login this way and they have ultimate control over the archive.
The solution is to use HTTPS (HTTP over SSL, i.e. Secure Socket Layer, an encrypted transport), which protects your passwords against being captured. You can configure DSpace to require SSL on all "authenticated" transactions so it only accepts passwords on SSL connections.
The following sections show how to set up the most commonly-used Java Servlet containers to support HTTP over SSL.
$JAVA_HOME/bin/keytool -import -noprompt -v -storepass changeit -keystore $CATALINA_BASE/conf/keystore -alias tomcat -file myserver.pem |
$JAVA_HOME/bin/keytool -import -noprompt -storepass changeit -trustcacerts -keystore $CATALINA_BASE/conf/keystore -alias ServerCA -file ca.pem |
$JAVA_HOME/bin/keytool -import -noprompt -storepass changeit -trustcacerts -keystore $CATALINA_BASE/conf/keystore -alias client1 -file client1.pem |
<Connector port="8443" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" disableUploadTimeout="true" acceptCount="100" debug="0" scheme="https" secure="true" sslProtocol="TLS" keystoreFile="conf/keystore" keystorePass="changeit" clientAuth="true" - ONLY if using client X.509 certs for authentication! truststoreFile="conf/keystore" trustedstorePass="changeit" /> |
<Connector port="8080" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" debug="0" /> |
$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA -keysize 1024 \ -keystore $CATALINA_BASE/conf/keystore -storepass changeit -validity 365 \ -dname 'CN=dspace.myuni.edu, OU=MIT Libraries, O=Massachusetts Institute of Technology, L=Cambridge, S=MA, C=US' |
$JAVA_HOME/bin/keytool -keystore $CATALINA_BASE/conf/keystore -storepass changeit \ -certreq -alias tomcat -v -file tomcat.csr |
$JAVA_HOME/bin/keytool -keystore $CATALINA_BASE/conf/keystore -storepass changeit \ -import -alias mitCA -trustcacerts -file mitCA.pem |
$JAVA_HOME/bin/keytool -keystore $CATALINA_BASE/conf/keystore -storepass changeit \ -import -alias tomcat -trustcacerts -file signed-cert.pem |
$JAVA_HOME/bin/keytool -genkey -alias tomcat -keyalg RSA -keystore $CATALINA_BASE/conf/keystore -storepass changeit |
$JAVA_HOME/bin/keytool -import -noprompt -storepass changeit -trustcacerts -keystore $CATALINA_BASE/conf/keystore -alias client1 -file client1.pem |
If you choose Apache HTTPD as your primary HTTP server, you can have it forward requests to the Tomcat servlet container via Apache Jakarta Tomcat Connector. This can be configured to work over SSL as well. First, you must configure Apache for SSL; for Apache 2.0 see Apache SSL/TLS Encryption for information about using mod_ssl.
If you are using X.509 Client Certificates for authentication: add these configuration options to the appropriate httpd configuration file, e.g. ssl.conf, and be sure they are in force for the virtual host and namespace locations dedicated to DSpace:
## SSLVerifyClient can be "optional" or "require" SSLVerifyClient optional SSLVerifyDepth 10 SSLCACertificateFile path-to-your-client-CA-certificate SSLOptions StdEnvVars ExportCertData |
Now consult the Apache Jakarta Tomcat Connector documentation to configure the mod_jk (note: NOTmod_jk2) module. Select the AJP 1.3 connector protocol. Also follow the instructions there to configure your Tomcat server to respond to AJP.
To use SSL on Apache HTTPD with mod_webapp consult the DSpace 1.3.2 documentation. Apache have deprecated the mod_webapp connector and recommend using mod_jk.
To use Jetty's HTTPS support consult the documentation for the relevant tool.
First a few facts to clear up some common misconceptions:
A Handle server runs as a separate process that receives TCP requests from other Handle servers, and issues resolution requests to a global server or servers if a Handle entered locally does not correspond to some local content. The Handle protocol is based on TCP, so it will need to be installed on a server that can broadcast and receive TCP on port 2641.
[dspace]/bin/dspace make-handle-config [dspace]/handle-server |
"storage_type" = "CUSTOM" "storage_class" = "org.dspace.handle.HandlePlugin" |
[dspace]/bin/start-handle-server |
If you need to update the handle prefix on items created before the CNRI registration process you can run the [dspace]/bin/dspace update-handle-prefix script. You may need to do this if you loaded items prior to CNRI registration (e.g. setting up a demonstration system prior to migrating it to production). The script takes the current and new prefix as parameters. For example:
[dspace]/bin/dspace update-handle-prefix 123456789 1303 |
This script will change any handles currently assigned prefix 123456789 to prefix 1303, so for example handle 123456789/23 will be updated to 1303/23 in the database.
To aid web crawlers index the content within your repository, you can make use of sitemaps. There are currently two forms of sitemaps included in DSpace: Google sitemaps and HTML sitemaps.
Sitemaps allow DSpace to expose its content without the crawlers having to index every page. HTML sitemaps provide a list of all items, collections and communities in HTML format, whilst Google sitemaps provide the same information in gzipped XML format.
To generate the sitemaps, you need to run [dspace]/bin/dspace generate-sitemaps This creates the sitemaps in [dspace]/sitemaps/
The sitemaps can be accessed from the following URLs:
When running [dspace]/bin/dspace generate-sitemaps the script informs Google that the sitemaps have been updated. For this update to register correctly, you must first register your Google sitemap index page (/dspace/sitemap) with Google at http://www.google.com/webmasters/sitemaps/. If your DSpace server requires the use of a HTTP proxy to connect to the Internet, ensure that you have set http.proxy.host and http.proxy.port in [dspace]/config/dspace.cfg
The URL for pinging Google, and in future, other search engines, is configured in [dspace-space]/config/dspace.cfg using the sitemap.engineurls setting where you can provide a comma-separated list of URLs to 'ping'.
You can generate the sitemaps automatically every day using an additional cron job:
# Generate sitemaps 0 6 * * * [dspace]/bin/dspace generate-sitemaps |
DSpace uses the Apache Solr application underlaying the statistics. There is no need to download any separate software. All the necessary software is included. To understand all of the configuration property keys, the user should refer to 5.2.35 DSpace Statistic Configuration for detailed information.
solr.log.server = ${dspace.baseUrl}/solr/statistics solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat solr.spiderips.urls = http://iplists.com/google.txt, \ http://iplists.com/inktomi.txt, \ http://iplists.com/lycos.txt, \ http://iplists.com/infoseek.txt, \ http://iplists.com/altavista.txt, \ http://iplists.com/excite.txt, \ http://iplists.com/misc.txt, \ http://iplists.com/non_engines.txt |
useProxies = true |
cd [dspace-source]/dspace mvn package cd [dspace-source]/dspace/target/dspace-<version>-build.dir ant -Dconfig=[dspace]/config/dspace.cfg update cp -R [dspace]/webapps/* [TOMCAT]/webapps |
If you are installing DSpace on Windows, you will still need to install all the same Prerequisite Software, as listed above.
dspace.dir config.template.log4j.properties config.template.log4j-handle-plugin.properties config.template.oaicat.properties assetstore.dir log.dir upload.temp.dir report.dir handle.dir |
mvn package |
mvn -Ddb.name=oracle package |
ant fresh_install |
ant help
[dspace]\bin\dspace create-administrator |
<!-- DEFINE A CONTEXT PATH FOR DSpace JSP User Interface --> <Context path="/jspui" docBase="[dspace]\webapps\jspui" debug="0" reloadable="true" cachingAllowed="false" allowLinking="true"/> <!-- DEFINE A CONTEXT PATH FOR DSpace OAI User Interface --> <Context path="/oai" docBase="[dspace]\webapps\oai" debug="0" reloadable="true" cachingAllowed="false" allowLinking="true"/> |
The administrator needs to check the installation to make sure all components are working. Here is list of checks to be performed. In brackets after each item, it the associated component or components that might be the issue needing resolution.
In any software project of the scale of DSpace, there will be bugs. Sometimes, a stable version of DSpace includes known bugs. We do not always wait until every known bug is fixed before a release. If the software is sufficiently stable and an improvement on the previous release, and the bugs are minor and have known workarounds, we release it to enable the community to take advantage of those improvements.
The known bugs in a release are documented in the KNOWN_BUGS file in the source package.
Please see the DSpace bug tracker for further information on current bugs, and to find out if the bug has subsequently been fixed. This is also where you can report any further bugs you find.
In an ideal world everyone would follow the above steps and have a fully functioning DSpace. Of course, in the real world it doesn't always seem to work out that way. This section lists common problems that people encounter when installing DSpace, and likely causes and fixes. This is likely to grow over time as we learn about users' experiences.
ant fresh_install
: There are two common errors that occur.
[java] 2004-03-25 15:17:07,730 INFO org.dspace.storage.rdbms.InitializeDatabase @ Initializing Database [java] 2004-03-25 15:17:08,816 FATAL org.dspace.storage.rdbms.InitializeDatabase @ Caught exception: [java] org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. [java] at org.postgresql.jdbc1.AbstractJdbc1Connection.openConnection(AbstractJd bc1Connection.java:204) [java] at org.postgresql.Driver.connect(Driver.java:139) |
psql -U dspace -W -h localhost |
[java] 2004-03-25 16:37:16,757 INFO org.dspace.storage.rdbms.InitializeDatabase @ Initializing Database [java] 2004-03-25 16:37:17,139 WARN org.dspace.storage.rdbms.DatabaseManager @ Exception initializing DB pool [java] java.lang.ClassNotFoundException: org.postgresql.Driver [java] at java.net.URLClassLoader$1.run(URLClassLoader.java:198) [java] at java.security.AccessController.doPrivileged(Native Method) [java] at java.net.URLClassLoader.findClass(URLClassLoader.java:186) |
ps -ef | grep java
and look for Tomcat's Java processes. If they stay around after running Tomcat's shutdown.sh script, trying running kill
on them (or kill -9
if necessary), then starting Tomcat again.ps -ef | grep postgres
dspace 16325 1997 0 Feb 14 ? 0:00 postgres: dspace dspace 127.0.0.1 idle in transaction |
dspace 16325 1997 0 Feb 14 ? 0:00 postgres: dspace dspace 127.0.0.1 SELECT |
kill
on the process, and stopping and restarting Tomcat.