Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
minLevel2

Excerpt
 

Solr in DSpace

What is Solr: http://lucene.apache.org/solr/features.html

...

While you could make Solr publicly accessible by changing this default configuration, this is not recommended, because Solr indexes may contain some data you might consider private. Instead, use one of following simple means to bypass this restriction temporarily. All of them will make Solr accessible only to the machine you're connecting from for as long as the connection is open.

  1. OpenSSH client - port forwarding
    connect to DSpace server and forward its port 8080 to localhost (machine we're connecting from) port 1234

    Code Block
    ssh -L 1234:127.0.0.1:8080 mydspace.edu

    makes mydspace.edu:8080 accessible via localhost:1234 (type http://localhost:1234 in browser address bar); also opens ssh shell
    exit ssh to terminate port forwarding
    Alternatively:

    Code Block
    ssh -N -f -L 1234:127.0.0.1:8080 mydspace.edu

    run with -N and -f flags if you want ssh to go to background
    kill the ssh process to terminate port forwarding

  2. Putty client - port forwarding
    The same with Putty:

    Code Block
    
    Connection - SSH - Tunnels
    Source port: 8080
    Destination: localhost:1234
    Local
    Auto
    Add
    
  3. OpenSSH client - SOCKS proxy
    connect to DSpace server and run a SOCKS proxy server on localhost port 1234; configure browser to use localhost:1234 as SOCKS proxy and remove "localhost" and "127.0.0.1" from addresses that bypass this proxy
    all browser requests now originate from dspace server (source IP is dspace server's IP) - dspace is the proxy server
    type http://localhost:8080in browser address bar - localhost here is the dspace server

    Code Block
    ssh -D 1234 mydspace.edu

...

  1. turn off the localhost filter in Tomcat
  2. replace it with a RemoteAddrValve and allow an enumerated set of IP addresses or subnets

    Code Block
    
    Change your server.xml or alternatively your context fragment (i.e. conf/Catalina/localhost/solr.xml) like this:
    <Context path="/solr" reloadable="true">
            <Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="111.222.233.*, 123.123.123.123, 127.0.0.1"/>
            <Parameter name="LocalHostRestrictionFilter.localhost" value="false" override="false" />
    </Context>
    

    Do not forget to include localhost (i.e. 127.0.0.1) in the allowed list, otherwise Discovery, OAI 2.0 and other things depending on Solr) won't work.

(see also DS-1260)

Accessing Solr

...

The two Solr instances in DSpace Discovery are called "search" and "statistics". search contains data about communities, collections, items and bitstreams. statistics contains data about searches, accessing users, IPs etc. The two instances are accessible at following URLs (relative to the dspace server):

Code Block

http://localhost:8080/solr/search/
http://localhost:8080/solr/statistics/

...

Both Solr cores have separate administration interfaces which let you view their respective schemas, configurations, set up logging and submit queries. The schema browser here is very useful to list fields (and their types) included in each index and even see an overview of most common values of individual fields with their frequency.

Code Block

http://localhost:8080/solr/search/admin/
http://localhost:8080/solr/statistics/admin/

...

The base URL of the default Solr search handler is as follows:

Code Block

http://localhost:8080/solr/search/search
http://localhost:8080/solr/statistics/search

Using the knowledge of particular fields from Solr Admin and Solr syntax (SolrQuerySyntax, CommonQueryParameters) you can make your own search requests.
You can also look at the Tomcat log file to see queries generated by XMLUI in real time

Code Block

tail -f /var/log/tomcat6/catalina.out

...

To get all items (search.resourcetype:2) sorted by date accessioned (dc.date.accessioned_dt) in order from newest to oldest (desc; %20 is just an url-encoded space character):

Code Block

http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc

...

To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):

Code Block

http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc&rows=1&fl=dc.date.accessioned&omitHeader=true

Top downloaded items by a specific user

Code Block

http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0

...

You can include another XML document to be processed by XSLT using the document() function. The parameter to this function is a string with the path to the XML document to process. This can be either a static .xml file stored on the server filesystem or a URL, which will be fetched at time of processing. For Solr, the latter is what we need. Furthermore, we need to distinguish templates for processing this external XML document as opposed to the input XML document. We'll do this using the mode attribute and define a different processing mode for each query.

Code Block
XML
XML

<xsl:apply-templates select="document('http://localhost:8080/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
mode="solr-response"/>

Now we need to define a template with the same mode that matches elements contained in the Solr response XML:

Code Block
XML
XML

<xsl:template match="/response/result/doc/date" mode="solr-response">
    Last item was imported: <xsl:value-of select="text()"/>
</xsl:template>

...

For description of the query parameters, see above.

  1. Add confman namespace and "confman" to exclude-result-prefixes.

    Code Block
    XML
    XML
    
    <xsl:stylesheet
    ...
        xmlns:confman="org.dspace.core.ConfigurationManager"
        exclude-result-prefixes="... confman">
    
  2. Add this simple template to process Solr result. More complex date formatting can be done easily in XSLT 2.0 (see XSLT 2.0 spec), however Cocoon still uses XSLT 1.0 (see DS-995). It is currently also possible to call Java functions to do date formatting.

    Code Block
    XML
    XML
    
    <xsl:template match="/response/result/doc/date" mode="lastItem">
        Last item was imported: <xsl:value-of select="substring(text(), 1, 10)"/>
    </xsl:template>
    
  3. Add the following code to the place where you want the resulting text to appear:

    Code Block
    XML
    XML
    
    <xsl:variable name="solr-search-url" select="confman:getProperty('discovery', 'search.server')"/>
    <xsl:apply-templates select="document(concat($solr-search-url, '/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
    mode="lastItem"/>
    

    For example, to add it after the list of Recent items in Mirage, override its template like this:

    Code Block
    XML
    XML
    
    <xsl:template match="dri:referenceSet[@type = 'summaryList' and @n='site-last-submitted']" priority="2">
        <xsl:apply-templates select="dri:head"/>
        <!-- Here we decide whether we have a hierarchical list or a flat one -->
        <xsl:choose>
            <xsl:when test="descendant-or-self::dri:referenceSet/@rend='hierarchy' or ancestor::dri:referenceSet/@rend='hierarchy'">
                <ul>
                    <xsl:apply-templates select="*[not(name()='head')]" mode="summaryList"/>
                </ul>
            </xsl:when>
            <xsl:otherwise>
                <ul class="ds-artifact-list">
                    <xsl:apply-templates select="*[not(name()='head')]" mode="summaryList"/>
                </ul>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:variable name="solr-search-url" select="confman:getProperty('discovery', 'search.server')"/>
        <xsl:apply-templates select="document(concat($solr-search-url, '/select?q=search.resourcetype:2&amp;sort=dc.date.accessioned_dt%20desc&amp;rows=1&amp;fl=dc.date.accessioned_dt&amp;omitHeader=true'))"
    mode="lastItem"/>
    </xsl:template>
    

"AND" search as default

DSpace Discovery uses the "OR" operator as default if you don't specify an operator between your keywords. So searching for "John Doe" will also return entries like "Jane Doe" and "John Connor". If you want to change that, you have to edit the schema.xml file of the Solr search core:

In [dspace]/solr/search/conf/schema.xml, find this line:

Code Block
languagehtml/xml
<solrQueryParser defaultOperator="OR"/>

and change it to

Code Block
languagehtml/xml
<solrQueryParser defaultOperator="AND"/>

Then restart your servlet container (Tomcat).

Warning
titleWarning

It's not officially recommended to change this setting. Some unrelated Discovery features might stop working if you do this. I haven't noticed anything wrong, but you might. If something breaks, make sure to notify us and we'll try to fix it or remove this tip.

Guidepost

Other pages on this wiki describing Solr and Discovery.

...