Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
minLevel2

Excerpt
 

Solr in DSpace

What is Solr: http://lucene.apache.org/solr/features.html

...

Please, note, that to get data from Solr, you don't technically need to enable the Discovery aspect, but you do need to populate the index. The statistics core is populated automatically in DSpace 1.6+. To populate the search core (DSpace 1.7+), you need to run [dspace]/bin/dspace updateindex-discovery-index (you will probably want to schedule it in cron to run periodically, too). In DSpace versions older than 4.x, the command was called [dspace]/bin/dspace update-discovery-index. There should be no reason to access the oai core (DSpace 3.0), because it contains the same information as the search core, but if you want to populate it, run [dspace]/bin/dspace oai import.

...

Warning

Before you try to follow the advice below to bypass the localhost restriction, please note:

  • Exposing the Solr interface means, that any restricted metadata such as dc.description.provenance and non-anonymized usage statistics (client IPs, user agent strings) will be accessible.
  • Exposing the Solr interface also means that it will be exposed for write access. There is no easy way to expose only read access.
  • Never expose Solr to the internet. If you're exposing it to an IP within your network, add it as an exception to the LocalHostRestrictionFilter. If you have to expose Solr to a public IP, use a SSH tunnel or a VPN for the connection.

...


Bypassing localhost restriction temporarily

...

  1. turn off the localhost filter in Tomcat
  2. replace it with a RemoteAddrValve and allow an enumerated set of IP addresses or subnets (in the following example the 127.0.0.1, 123.123.123.123 IPs and the 111.222.333.* subnet would be allowed):

    Code Block
    titleChange
    your
    server.xml
    or
    alternatively
    your
    context
    fragment
    (i.e.
    conf/Catalina/localhost/solr.xml)
    like
    this:
    
    <Context path="/solr" reloadable="true">
            <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.0\.0\.1|123\.123\.123\.123|111\.222\.233\.d+"/>
            <Parameter name="LocalHostRestrictionFilter.localhost" value="false" override="false" />
    </Context>
    

    Do not forget to include localhost (i.e. 127.0.0.1) in the allowed list, otherwise Discovery, OAI 2.0 and other things depending on Solr) won't work.

(see also DS-1260)

Instructions specific to Tomcat 6 and older

See also:

Instructions specific to Tomcat 6 and older

Please, note that the syntax of the "allow" attribute changed in Tomcat 7 Please, note that the syntax of the "allow" attribute changed in Tomcat 7 to a single regular expression. In Tomcat 6 and older, it was a comma-separated list of regular expressions, therefore this worked in Tomcat 6, but does not work in Tomcat 7+:

Code Block
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="111.222.233.*, 123.123.123.123, 127.0.0.1"/>

See also: Tomcat 6 documentation: Remote Address Filter 

Accessing Solr

Solr cores

DSpace contains a so-called multicore installation of Solr. That means that there are multiple Solr indexes and configurations sharing one Solr codebase. If you're familiar with Apache HTTPD, it is analogous to multiple virtual hosts running on one Apache server (separate configuration and webpages), except that individual Solr cores are accessible via different URL (as opposed to virtualhost IP:port).

...

Using the knowledge of particular fields from Solr Admin and Solr syntax (SolrQuerySyntax, CommonQueryParameters) you can make your own search requests. You can also read a brief tutorial to learn the query syntax quickly.
You can also look at the Tomcat solr log file (in older dspace versions, this was logged to catalina.out) to see queries generated by XMLUI in real time:

Code Block
tail -f /vardspace/log/tomcat6/catalinasolr.outlog

(depending on your OS, Tomcat installation method and logging settings, the path may be different)

...

By default, Solr responses are returned in XML format. However, Solr can provide several other output formats including JSON and CSV. Discovery uses the javabin format. The Solr request parameter is wt (e.g. &wt=json). For more information, see Response Writers, QueryResponseWriters.
An interesting option is to specify an XSLT stylesheet that can transform the XML response (server-side) to any format you choose, typically HTML. Append &wt=xslt&tr=example.xsl to the Solr request URL. The .xsl files must be provided in the [dspace]/solr/search/conf/xslt/ directory.
For more information, see XsltResponseWriter.

Examples

Date of last deposited item

To get all items (search.resourcetype:2) sorted by date accessioned (dc.date.accessioned_dt) in order from newest to oldest (desc; %20 is just an url-encoded space character):

PHP example

Code Block
languagephp
$solr_baseurl_dspace = "
Code Block
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc

Note:

search.resourcetype:2

items

search.resourcetype:3

communities

search.resourcetype:4

collections

To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):

Code Block
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc&rows=1&fl=dc.date.accessioned&omitHeader=true

...

query?";
$solr_query = "test";
$solr_URL_dspace = $solr_baseurl_dspace."wt=phps&q=".urlencode($solr_query." AND withdrawn:false"); // use withdrawn:false with DSpace newer than 1.8
$response_dspace = file_get_contents($solr_URL_dspace, false, stream_context_create(array('http' => array('timeout' => 10))));
$result_dspace = unserialize($response_dspace);
$num_dspace = $result_dspace['response']['numFound'];
echo $num_dspace;


Warning

Keep in mind that although using the phps writer may be faster, it's not recommended for untrusted user data (see PHP unserialize() notes).

Examples

Date of last deposited item

To get all items (search.resourcetype:2) sorted by date accessioned (dc.date.accessioned_dt) in order from newest to oldest (desc; %20 is just an url-encoded space character):

Code Block
http://localhost:8080/solr/statisticssearch/select?indent=on&version=2.q=search.resourcetype:2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0sort=dc.date.accessioned_dt%20desc

Note:

facet.field=epersonid

You want to group by epersonid, which is the user id

type:0

Interested in bitstreams only

Number of items in a specific community

search.resourcetype:2

items

search.resourcetype:3

communities

search.resourcetype:4

collections

To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):

Code Block
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc&rows=1&fl=dc.date.accessioned&omitHeader=true

Top downloaded items by a specific user

Code Block
http://localhost:8080/solr/statistics/select?indent=on&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0

Note:

facet.field=epersonid

You want to group by epersonid, which is the user id

type:0

Interested in bitstreams only

Number of items in a specific community

Community here is specified by its "community_id" - the identifier from the "community" table in database. The result is the "numFound" attribute of the "result" element. This example returns number of items (search.resourcetype:2) in community with community_id=85 (location.comm:85):

Code Block
http://localhost:8080/solr/search/select/?q=location.comm:85+AND+search.resourcetype:2&start=0&rows=0&indent=on

Breakdown of submitted items per month

Show breakdown of items (search.resourcetype:2) submitted (facet.date=dc.date.accessioned_dt) per month (facet.date.gap=+MONTH) in the year 2016 (facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z):

Code Block
http://localhost:8080/solr/search/select?indent=on&rows=0&facet=true&facet.date=dc.date.accessioned_dt&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=search.resourcetype:2

Statistics breakdown per event type

Starting from DSpace 3, there is a statistics_type field in the statistics core that contains the "usage event type". Currently, the available types are search, viewsearch_result and workflow. Here's how to get event breakdown by type, excluding robots (isBot:false):


Code Block
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.field=statistics_type&q=isBot:false

Statistics: breakdown of downloads per month

Show breakdown of bitstream (type:0) downloads per month in the year 2016, excluding robots (isBot:false):

Code Block
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.date=time&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=type:0+AND+isBot:false

Statistics: number of downloads (item views) for a specific item per month

Show bitstream (type:0) downloads per month in the year 2016, excluding robots (isBot:false), for a specific item (2163 in the example):

Code Block
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.date=time&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=type:0+owningItem:2163&fq=-isBot:true&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)

Statistics: number of total downloads in a given time span

Show the total repository-wide bitstream (type:0) downloads, excluding robots (isBot:false), for a specific duration (September 1 2017 through September 1 2018).  No need for faceting to get a total countCommunity here is specified by its "community_id" - the identifier from the "community" table in database. The result is the "numFound" attribute of the "result" element. This example returns number of items (search.resourcetype:2) in community with community_id=85 (location.comm:85):

Code Block
http://localhost:8080/solr/searchstatistics/select/?q=location.comm:85+AND+search.resourcetype:2&version=2.2&start=0&rows=0&indent=on?indent=on&rows=0&q=time:[2017-09-01T00:00:00Z+TO+2018-09-01T00:00:00Z]+AND+type:0+AND+isBot:false

Querying Solr from XMLUI

Since Solr returns its responses in XML, it's possible and easy to call custom Solr queries from XMLUI, process the XML response with XSLT and display the results in human-readable form on the HTML page.
There are two ways how to do that - synchronously in Cocoon or asynchronously using AJAX (JavaScript) after the page is loaded. Solr queries are usually very fast, so only synchronous calls will be shown here.

You can include another XML document to be processed by XSLT using the document() function. The parameter to this function is a string with the path to the XML document to process. This can be either a static .xml file stored on the server filesystem or a URL, which will be fetched at time of processing. For Solr, the latter later is what we need. Furthermore, we need to distinguish templates for processing this external XML document as opposed to the input XML document. We'll do this using the mode attribute and define a different processing mode for each query.

Code Block
XML
XML
<xsl:apply-templates select="document('http://localhost:8080/solr/search/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
mode="solr-response"/>

...

  1. Add the confman namespace and "confman" to exclude-result-prefixes.result-prefixes. (For explanation, see how to Call Java methods from XSLT (Manakin))

    Code Block
    XML
    XML
    <xsl:stylesheet
    ...
        xmlns:confman="org.dspace.core.ConfigurationManager"
        exclude-result-prefixes="... confman">
    


  2. Add this simple template to process the Solr query result. More complex date formatting can be done easily in XSLT 2.0 (see XSLT 2.0 spec), however Cocoon still uses XSLT 1.0 (see DS-995). It is currently also possible to call Java functions to do date formatting.

    Code Block
    XML
    XML
    <xsl:template match="/response/result/doc/date" mode="lastItem">
        Last item was imported: <xsl:value-of select="substring(text(), 1, 10)"/>
    </xsl:template>
    


  3. Add the following code to the place where you want the resulting text to appear:

    Code Block
    XML
    XML
    <xsl:variable name="solr-search-url" select="confman:getProperty('discovery', 'search.server')"/>
    <xsl:apply-templates select="document(concat($solr-search-url, '/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
    mode="lastItem"/>
    

    For example, to add it after the list of Recent items in Mirage, override its template like this:

    Code Block
    XML
    XML
    <xsl:template match="dri:referenceSet[@type = 'summaryList' and @n='site-last-submitted']" priority="2">
        <xsl:apply-templates select="dri:head"/>
        <!-- Here we decide whether we have a hierarchical list or a flat one -->
        <xsl:choose>
            <xsl:when test="descendant-or-self::dri:referenceSet/@rend='hierarchy' or ancestor::dri:referenceSet/@rend='hierarchy'">
                <ul>
                    <xsl:apply-templates select="*[not(name()='head')]" mode="summaryList"/>
                </ul>
            </xsl:when>
            <xsl:otherwise>
                <ul class="ds-artifact-list">
                    <xsl:apply-templates select="*[not(name()='head')]" mode="summaryList"/>
                </ul>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:variable name="solr-search-url" select="confman:getProperty('discovery', 'search.server')"/>
        <xsl:apply-templates select="document(concat($solr-search-url, '/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
    mode="lastItem"/>
    </xsl:template>


...

Code Block
titleexample query (not tested)
http://localhost:8080/solr/search/select/?q=*:*&fq={!join from=owningItem to=search.resourceid fromIndex=statistics}title:"Testing title"

"AND" search as default

Up to and including DSpace 5 (see DS-2809), Discovery uses the "OR" operator as default if you don't specify an operator between your query keywords. So searching for "John Doe" will also return entries like "Jane Doe" and "John Connor". If you want to change that, you have to edit the schema.xml file of the Solr search core:

...

If for whatever reason you need to delete the data in your index (which would normally be followed by running [dspace]/bin/dspace index-discovery (in DSpace versions older than 4.x, it was called [dspace]/bin/dspace update-discovery-index), but you can use the -b parameter instead to reindex everything), here's how you can do it:

...

...

Set up Solritas (VelocityResponseWriter)

Solritas is a generic search interface on top of a Solr index. It can be useful if you want to explore the contents of a Solr index (core) using facets.

...


It should also be possible to use it in other versions of DSpace (starting from 1.6), but these use different versions of Solr, so modify the procedure accordingly (and expect other caveats):

DSpace 6Solr 4.10.2
DSpace 5Solr 4.
0
10.2
DSpace 4Solr 4.4.0
DSpace 3
.0
Solr 3.5.0
DSpace 1.8Solr 3.3.0
DSpace 1.7Solr 1.4.1
DSpace 1.6Solr 1.3.0

Note: In older versions, you may need to specify the queryResponseWriter class as org.apache.solr.request.VelocityResponseWriter (I haven't tested it, though)

...

  • Discovery Official DSpace 3.x documentation
  • DSpace Discovery Discovery proposal & purpose, intro video, Discovery 1.8 changes & configuration
  • DSpace Discovery HowTo Discovery screenshots (before Discovery was included in DSpace), most content obsolete (pre-1.7.0)

See also:

  • Solr Tutorial
  • ajax-solr, a JavaScript library for creating user interfaces to Solr.
  • /var/log/tomcat6/catalina.out