Solr in DSpace
What is Solr: http://lucene.apache.org/solr/features.html
DSpace uses Solr as a part of Discovery as index to speed up access to content metadata and data about access to DSpace (for statistics). It also provides faceting and search results filtering. If Discovery is enabled, the DSpace search field accepts Solr search syntax.
Discovery is an optional part of DSpace since 1.7 (with big improvements and configuration format changes in 1.8). When enabled, Discovery replaces DSpace Search and Browse and provides Solr-based statistics.
Do I need to read this page?
To gain the benefits of faceting and filtering in XMLUI, all you need to do is enable Discovery. The rest of these page describes some advanced uses of Solr - if you want to query Solr directly for theme customization or read DSpace metadata from outside DSpace.
Connecting to Solr
By default, the DSpace Solr server is configured to listen only on localhost port 8080 (unless you specified another port in Tomcat configuration and the [dspace]/config/modules/discovery.cfg
config file). That means that you cannot connect from another machine to the dspace server port 8080 and request a Solr URL - you'll get a HTTP 403 error. This configuration was done for security considerations - Solr index contains some data that is not accessible via public DSpace interfaces and some of the data might be sensitive.
While you could make Solr publicly accessible by changing this default configuration (if you want to do so, search for LocalHostRestrictionFilter), this is not recommended. Instead, use one of following simple means to bypass this restriction temporarily. All of them will make Solr accessible only to the machine you're connecting from for as long as the connection is open.
- OpenSSH client - port forwarding
connect to DSpace server and forward its port 8080 to localhost (machine we're connecting from) port 1234
makes mydspace.edu:8080 accessible via localhost:1234 (type http://localhost:1234 in browser address bar)exit ssh to terminate port forwardingssh -L 1234:127.0.0.1:8080 mydspace.edu
run with -N and -f flags if you want ssh to go to background; kill the ssh process to terminate port forwardingssh -N -f -L 1234:127.0.0.1:8080 mydspace.edu
- Putty client - port forwarding
The same with Putty:Connection - SSH - Tunnels Source port: 8080 Destination: localhost:1234 Local Auto Add
- OpenSSH client - SOCKS proxy
connect to DSpace server and run a SOCKS proxy server on localhost port 1234; configure browser to use localhost:1234 as SOCKS proxy
all browser requests now originate from dspace server (source IP is dspace server's IP) - dspace is the proxy server
type http://localhost:8080 in browser address bar - localhost here is the dspace serverssh -D 1234 mydspace.edu
Accessing Solr
Solr cores
DSpace contains a so-called multicore installation of Solr. That means that there are multiple Solr indexes and configurations sharing one Solr codebase. If you're familiar with Apache HTTPD, it is analogous to multiple virtual hosts running on one Apache server (separate configuration and webpages), except that individual Solr cores are accessible via different URL (as opposed to virtualhost IP:port).
The two Solr instances in DSpace Discovery are called "search
" and "statistics
". search
contains data about communities, collections, items and bitstreams. statistics
contains data about searches, accessing users, IPs etc. The two instances are accessible at following URLs (relative to the dspace server):
http://localhost:8080/solr/search/ http://localhost:8080/solr/statistics/
Solr admin interface
Both Solr cores have separate administration interfaces which let you view thier respective schemas, configurations, set up logging and submit queries. The schema browser here is very useful to list fields (and their types) included in each index and even see an overview of most common values of individual fields with their frequency.
http://localhost:8080/solr/search/admin/ http://localhost:8080/solr/statistics/admin/
Solr queries
The base URL of the default Solr search handler is as follows:
http://localhost:8080/solr/search/search http://localhost:8080/solr/statistics/search
Using the knowledge of particular fields from Solr Admin and Solr syntax (SolrQuerySyntax, CommonQueryParameters) you can make your own search requests.
You can also look at the Tomcat log file to see queries generated by XMLUI in real time
tail -f /var/log/tomcat6/catalina.out
(depending on your OS, Tomcat installation method and logging settings, the path may be different)
Solr responses
By default, Solr responses are returned in XML format. However, Solr can provide several other output formats including JSON and CSV. Discovery uses the javabin format. The Solr request parameter is wt (e.g. &wt=json). For more information, see Response Writers, QueryResponseWriters.
An interesting option is to specify an XSLT stylesheet that can transform the XML response (server-side) to any format you choose, typically HTML. Append &wt=xslt&tr=example.xsl to the Solr request URL. The .xsl files must be provided in the [dspace]/solr/search/conf/xslt/
directory.
For more information, see XsltResponseWriter.
Examples
Date of last deposited item
To get all items (search.resourcetype:2) sorted by date accessioned (dc.date.accessioned_dt) in order from newest to oldest (desc; %20 is just an url-encoded space character):
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc
Note:
search.resourcetype:2 — items
search.resourcetype:3 — communities
search.resourcetype:4 — collections
To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc&rows=1&fl=dc.date.accessioned&omitHeader=true
Top downloaded items by a specific user
http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
Note:
facet.field=epersonid — You want to group by epersonid, which is the user id.
type:0 — Interested in bitstreams only
Guidepost
Other pages on this wiki describing Solr and Discovery.
- Discovery Official DSpace 1.8 documentation
- DSpace Discovery Discovery proposal & purpose, intro video, Discovery 1.8 changes & configuration
- Discovery Configuration Configuration of Discovery in DSpace 1.7
- DSpace Discovery HowTo Discovery screenshots (before Discovery was included in DSpace), most content obsolete (pre-1.7.0)
See also: