Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

DSpace Discovery

Table of Contents
minLevel2
outlinetrue
stylenone

...

In a faceted search, a user can modify the list of displayed search results by specifying additional "filters" that will be applied on the list of search results. In DSpace, a filter is a contain condition applied to specific facets. In the example below, a user started with the search term "approach", which yielded 15 results. By applying the filter "economics" on the facet "Subject". After applying this filter, only 6 results remain.

unmigrated-wiki-markup

Another example would be the standard search operation \ [*wetland + "dc.author=Mitsch, William J" + dc.subject="water quality"* \ ]. With filtered search, a user can start by searching for \ [*wetland* \ ], and then filter the results by the other attributes, author and subject.

Discovery Features

  • Configurable sidebar browse facets that can display contents from any metadata field
    • Dynamically generated timespans for dates
  • Customizable recent submissions display on the repository homepage, collection and community pages
  • Auto-complete on search terms

...

  • Configuration moved from dspace.cfg into config/modules/discovery.cfg and config/spring/discovery/spring-dspace-addon-discovery-configuration-services.xml
  • Individual communities and collections can have their own Discovery configuration.
  • Tokenization for Auto-complete values (see SearchFilter)

Enabling Discovery

  • Alphanumeric sorting for Sidebarfacets
  • Possibility to avoid indexation of specific metadata fields.
  • Grouping of multiple metadata fields under the same SidebarFacet

Enabling Discovery

As with any upgrade procedure, As with any upgrade procedure, it is highly recommend that you backup your existing data thoroughly. Although upgrades in versions of Solr/Lucene do tend to be forwards compatible for the data stored in the Lucene index, it is always a best practice to backup your dspace.dir[dspace]/solr/statistics cores to assure no data is lost.

  1. Enable the Discovery Aspects in the XMLUI by changing the following settings in config/xmlui.xconf
    1. Comment out: SearchArtifacts
    2. Uncomment: Discovery
      Code Block
      XML
      XML
      <xmlui>
          <aspects>
              <aspect name="Artifact Browser" path="resource://aspects/ArtifactBrowser/" />
      <!--
                  @deprecated: the <aspect name="Browsing Artifacts" path="resource://aspects/BrowseArtifacts/" />Artifact Browser has been devided into ViewArtifacts,
                  BrowseArtifacts, SearchArtifacts
              <!--    <aspect name="SearchingArtifact ArtifactsBrowser" path="resource://aspects/SearchArtifactsArtifactBrowser/" />
              -->
              <aspect name="AdministrationDisplaying Artifacts" path="resource://aspects/AdministrativeViewArtifacts/" />
              <aspect name="E-PersonBrowsing Artifacts" path="resource://aspects/EPersonBrowseArtifacts/" />
              <!--<aspect name="SubmissionSearching and WorkflowArtifacts" path="resource://aspects/SubmissionSearchArtifacts/" />
      	-->
              <aspect name="StatisticsAdministration" path="resource://aspects/StatisticsAdministrative/" />
      
               <!--
        <aspect name="E-Person" path="resource://aspects/EPerson/" />
              <aspect name="Submission Toand enable Discovery, Workflow" path="resource://aspects/Submission/" />
      	<aspect name="Statistics" path="resource://aspects/Statistics/" />
      
              <!--
                  To enable Discovery, uncomment this Aspect that will enable it
                  within your existing XMLUI
                  Also make sure to comment the SearchArtifacts aspect
                  as leaving it on together with discovery will cause UI overlap issues-->
              <aspect name="Discovery" path="resource://aspects/Discovery/" />
      
      
              <!--
                  This aspect tests the various possible DRI features,
                  it helps a theme developer create themes
              -->
              <!-- <aspect name="XML Tests" path="resource://aspects/XMLTest/"/> -->
          </aspects>
      
  2. Enable the Discovery Indexing Consumer that will update Discovery Indexes on changes to content in XMLUI, JSPUI, SWORD, and LNI in config/dspace.cfg
    1. Add discovery to the list of event.dispatcher.default.consumers
      Code Block
      # default synchronous dispatcher (same behavior as traditional DSpace)
      event.dispatcher.default.class = org.dspace.event.BasicDispatcher
      #event.dispatcher.default.consumers = search, browse, eperson, harvester
      event.dispatcher.default.consumers = search, browse, discovery, eperson, harvester
      
    2. Change recent.submissions.count to zero
      Code Block
      #Put the recent submissions count to 0 so that discovery can use it's recent submissions,
      # not doing this when discovery is enabled will cause UI overlap issues
      #How many recent submissions should be displayed at any one time
      #recent.submissions.count = 5
      recent.submissions.count = 0
      
  3. Check that the port is correct for solr.search.server in config/modules/discovery.cfg
    1. If all of your traffic runs over port 80, then you need to remove the port from the URL
      Code Block
      ##### Search Indexing #####
      solr.search.server = http://localhost/solr/search
      
  4. From the command line, navigate to the dspace directory and run the command below to index the content of your DSpace instance into Discovery.
    Code Block
    ./bin/dspace update-discovery-index
    
    Panel

    NOTE: This step may take some time if you have a large number of items in your repository.

Configuration files

  1. Verify if you now see the Sidebar Facets on your DSpace homepage. Note that these are only visible when you have items in your repository.

Configuration files

The configuration for discovery The configuration for discovery is located in 2 separate files.

  • General settings: The discovery.cfg file located in the [dspace.dir]/config/modules directory.
  • User Interface Configuration: The spring-dspace-addon-discovery-configuration-services.xml file is located in [dspace.dir]/config/spring/discovery/ directory.

General Discovery settings (config/modules/discovery.cfg)

The discovery.cfg file is located in the [dspace.dir]/config/modules directory and contains following properties:

Property:

search.server

Example Value:

search.server=http://localhost:8080/solr/search

Informational Note:

Discovery relies on a SOLR index for storage and retrieval of its information. This parameter determines the location of the SOLR index.

Property:

search.default.sort.order index.ignore

Example Value:

search.default.sort.order=DESC index.ignore=dc.description.provenance,dc.language

Informational Note:

The default sort order for relevance when searching in discovery. This parameter can either be descending (DESC) or ascending (ASC). End-users can change this sort order from the user interface.

Property:

index.ignore

Example Value:

index.ignore=dc.description.provenance,dc.language

Informational Note:

By default, Discovery will include By default, Discovery will include all of the DSpace metadata in its search index. In cases where specific metadata is confidential, repository managers can include those fields by adding them to this comma separated list.

...

The spring-dspace-addon-discovery-configuration-services.xml file is located in the [dspace.dir]/config/spring directory.

...

Code Block
langxml
<bean id="sortTitle" class="org.dspace.discovery.configuration.DiscoverySortConfigurationDiscoverySortFieldConfiguration">
        <property name="metadataField" value="dc.title"/>
        <property name="type" value="text"/>
 </bean>

...

  • metadataField (Required): The metadata field indicating the sort values
  • type (optional): the type of the sort option can either be date or text, if none is defined text will be used.

DiscoveryConfiguration

...

The DiscoveryConfiguration Groups configurations for sidebar facets, search filters, search sort options and recent submissions. If you want to show the same sidebar facets, use the same search filters, search options and recent submissions everywhere in your repository, you will only need one DiscoveryConfiguration and you might as well just edit the defaultConfiguration.

...

  • The list of applicable sidebarFacets
  • The list of applicable searchFilters
  • The list of applicable searchSortFields
  • Any default filter queries (optional)
  • The configuration for the Recent submissions display

Configuring lists of sidebarFacets

...

and searchFilters

...

Below is an example of how one of these lists can be configured. It's important that each of the bean references corresponds with the exact name of the earlier defined Facets, filters or sort options.

Code Block
langxml
<property name="sidebarFacets">
    <list>
        <ref bean="sidebarFacetAuthor" />
        <ref bean="sidebarFacetSubject" />
        <ref bean="sidebarFacetDateIssued" />
    </list>
</property>

Adding default filter queries (OPTIONAL)

Default filter queries are applied on all search operations & sidebarfacet clicks. One useful application of default filter queries is ensuring that all returned results are items. As a result, subcommunities and collections that are returned as results of the search operation, are filtered out.
Similar to the lists above, the default filter queries are defined as a list. They are optional.

Configuring and customizing search sort fields

The search sort field configuration block contains the available sort fields and the possibility to configure a default sort field and sort order.
Below is an example of the sort configuration.

Code Block
langxml
<property name="defaultFilterQueriessearchSortConfiguration">
    <list><bean class="org.dspace.discovery.configuration.DiscoverySortConfiguration">
        <value>query1</value>
        <value>query2</value><!--<property name="defaultSort" ref="sortDateIssued"/>-->
        <!--DefaultSortOrder can either be desc or asc (desc is default)-->
    </list>
</property>

This property contains a simple list which in turn contains the queries. Some examples of possible queries:

  • search.resourcetype:2
  • dc.subject:test
  • dc.contributor.author: "Van de Velde, Kevin"
  • ...

Customizing the Recent Submissions display

The recent submissions configuration element contains all the configuration settings to display the list of recently submitted items on the home page or community/collection page. Because the recent submission configuration is in the discovery configuration block, it is possible to show 10 recently submitted items on the home page but 5 on the community/collection pages.

Below is an example configuration of the recent submissions.

Code Block
langxml
<property name="recentSubmissionConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryRecentSubmissionsConfiguration"    <property name="defaultSortOrder" value="desc"/>
        <property name="sortFields">
            <list>
                <ref bean="sortTitle" />
                <ref bean="sortDateIssued" />
        <property name="metadataSortField" value="dc.date.accessioned"/>    </list>
        <property name="type" value="date"/></property>
        <property name="max" value="5"/>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses below.

...

</bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses below.

  • defaultSort (optional): The default field on which the search results will be sorted, this must be a reference to an existing search sort field bean. If none is given relevance will be the default. Sorting according to the internal relevance algorithm is always available, even though it's not explicitly mentioned in the sortFields section.
  • defaultSortOrder (optional): The default sort order can either be asc or desc.
  • sortFields (mandatory): The list of available sort options, each element in this list must link to an existing sort field configuration bean.

Adding default filter queries (OPTIONAL)

Default filter queries are applied on all search operations & sidebarfacet clicks. One useful application of default filter queries is ensuring that all returned results are items. As a result, subcommunities and collections that are returned as results of the search operation, are filtered out.
Similar to the lists above, the default filter queries are defined as a list. They are optional.

Code Block
langxml
<property name="defaultFilterQueries">
    <list>
        <value>query1</value>
        <value>query2</value>
    </list>
</property>

This property contains a simple list which in turn contains the queries. Some examples of possible queries:

  • search.resourcetype:2
  • dc.subject:test
  • dc.contributor.author: "Van de Velde, Kevin"
  • ...

Customizing the Recent Submissions display

The recent submissions configuration element contains all the configuration settings to display the list of recently submitted items on the home page or community/collection page. Because the recent submission configuration is in the discovery configuration block, it is possible to show 10 recently submitted items on the home page but 5 on the community/collection pages.

Below is an example configuration of the recent submissions.

Code Block
langxml
<property name="recentSubmissionConfiguration">
    <bean class="org.dspace.discovery.configuration.DiscoveryRecentSubmissionsConfiguration">
        <property name="metadataSortField" value="dc.date.accessioned"/>
        <property name="type" value="date"/>
        <property name="max" value="5"/>
    </bean>
</property>

The property name & the bean class are mandatory. The property field names are discusses below.

  • metadataSortField (mandatory): The metadata field to sort on to retrieve the recent submissions
  • max (mandatory): The maximum number of results to be displayed as recent submissions
  • type (optional): the type of the search filter it can either be date or text, if none is defined text will be used.

Discovery SOLR Index Maintenance

Command used:

[dspace]/bin/dspace update-discovery-index [-cbhf[r <item handle>]]

Java class:

org.dspace.discovery.IndexClient

Arguments (short and long forms):

Description

 

called without any options, will update/clean an existing index

-b

(re)build index, wiping out current one if it exists

-c

clean existing index removing any documents that no longer exist in the db

-f

if updating existing index, force each handle to be reindexed even if uptodate

-h

print this help message

-o

optimize search core

-r <item handle>

remove an Item, Collection or Community from index based on its handle

Routine Discovery SOLR Index Maintenance

It is strongly recommended to run maintenance on the Discovery SOLR index daily (from crontab or your system's scheduler), to prevent your servlet container from running out of memory:

[dspace]/bin/dspace update-discovery-index -o

Advanced SOLR Configuration

Discovery is built as an application layer on top of the Open Source Enterprise Search Server SOLR. ThereforTherefore, SOLR configuration can be applied to the SOLR cores that are shipped with DSpace.
The DSpace SOLR instance itself now runs two cores. One for collection DSpace Solr based "statistics", the other for Discovery Solr based "search".

Code Block
solr
├── search
│   ├── conf
│   │   ├── admin-extra.html
│   │   ├── elevate.xml
│   │   ├── protwords.txt
│   │   ├── schema.xml
│   │   ├── scripts.conf
│   │   ├── solrconfig.xml
│   │   ├── spellings.txt
│   │   ├── stopwords.txt
│   │   ├── synonyms.txt
│   │   └── xslt
│   │       ├── DRI.xsl
│   │       ├── example.xsl
│   │       ├── example_atom.xsl
│   │       ├── example_rss.xsl
│   │       └── luke.xsl
│   └── conf2
├── solr.xml
└── statistics
    └── conf
        ├── admin-extra.html
        ├── elevate.xml
        ├── protwords.txt
        ├── schema.xml
        ├── scripts.conf
        ├── solrconfig.xml
        ├── spellings.txt
        ├── stopwords.txt
        ├── synonyms.txt
        └── xslt
            ├── example.xsl
            ├── example_atom.xsl
            ├── example_rss.xsl
            └── luke.xsl

Discovery 1.7 Tips & Tricks Web Seminar

This webinar has been broadcasted on June 1st, 2011 and its contents relate to DSpace 1.7

http://www.ustream.tv/recorded/15095992

Topics in this webinar include:

...

 example_rss.xsl
            └── luke.xsl