Page History
...
Another example: Using the standard search, a user would search for something like [wetland + "dc.author=Mitsch, William J" + dc.subject="water quality" ]. With filtered search, they can start by searching for [wetland ], and then filter the results by the other attributes, author and subject.
Discovery Changelist
DSpace
...
4.
...
0
Info |
---|
Starting from DSpace 4.0, Discovery is the default search and browse solution for DSpace. |
General improvements:
- Browse interfaces now also use Discovery index (rather than the legacy Lucene index)
- "Did you means" spell check aid for search
DSpace 3.0
Info |
---|
- Sidebar browse facets that can be configured to use contents from any metadata field
- Dynamically generated timespans for dates
- Customizable "recent submissions" view on the repository homepage, collection and community pages
- Hit highlighting & search snippets
DSpace 1.8
- Configuration moved from dspace.cfg into
config/modules/discovery.cfg
andconfig/spring/api/discovery.xml
- Individual communities and collections can have their own Discovery configuration.
- Tokenization for Auto-complete values (see SearchFilter)
- Alphanumeric sorting for Sidebarfacets
- Possibility to avoid indexation of specific metadata fields.
- Grouping of multiple metadata fields under the same SidebarFacet
DSpace 3.0
Info |
---|
Starting from DSpace 3.0, Discovery is also supported in JSPUI. |
...
- Hierarchical facets sidebar facets
- Improved & more intuitive user interface
- Access rights based resultsRights Awareness (enabled by default). Access restricted or embargoed content is hidden from anonymous search/browse.
- Authority control & variants awareness (homonyms Authority control & variants awareness (homonyms are shown separately in a facet if they have different authority ID). All variant forms as recognized by the authority framework are indexed. See See Authority Framework
XMLUI-only:
...
- Auto-complete functionality has been removed in XMLUI from search queries due to performance issues. JSPUI still supports auto-complete functionality without performance issues.
...
Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.
Configuration files
The configuration for discovery is located in 2 separate files.
- General settings: The
discovery.cfg
file located in the[dspace-install-dir]/config/modules directory
. - User Interface Configuration: The
discovery.xml
file is located in[dspace-install-dir]/config/spring/api/
directory.
General Discovery settings (config/modules/discovery.cfg
)
The discovery.cfg
file is located in the [dspace-install-dir]/config/modules
directory and contains following properties:
DSpace 1.8
- Configuration moved from dspace.cfg into
config/modules/discovery.cfg
andconfig/spring/api/discovery.xml
- Individual communities and collections can have their own Discovery configuration.
- Tokenization for Auto-complete values (see SearchFilter)
- Alphanumeric sorting for Sidebarfacets
- Possibility to avoid indexation of specific metadata fields.
- Grouping of multiple metadata fields under the same SidebarFacet
DSpace 1.7
- Sidebar browse facets that can be configured to use contents from any metadata field
- Dynamically generated timespans for dates
- Customizable "recent submissions" view on the repository homepage, collection and community pages
- Hit highlighting & search snippets
Enabling Discovery
Because Discovery was adopted as the default infrastructure for search and browse in DSpace 4, no manual steps are required to enable Discovery. If you want to enable Discovery on older versions of DSpace, please refer to the DSpace documentation for that particular version.
Configuration files
The configuration for discovery is located in 2 separate files.
- General settings: The
discovery.cfg
file located in the[dspace-install-dir]/config/modules directory
. - User Interface Configuration: The
discovery.xml
file is located in[dspace-install-dir]/config/spring/api/
directory.
General Discovery settings (config/modules/discovery.cfg
)
The discovery.cfg
file is located in the [dspace-install-dir]/config/modules
directory and contains following properties:
Property: | search.server | ||
Example Value: |
| ||
Informational Note: | Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index. | ||
Property: | index.ignore | ||
Example Value: |
| ||
Property: | search.server | ||
Example Value: |
| ||
Informational Note: | Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index. | ||
Property: | index.ignore | ||
Example Value: |
| ||
Informational Note: | By default, Discovery will include all of the DSpace metadata in its search index. In cases where specific metadata is confidential, repository managers can include those fields by adding them to this comma separated list. | ||
Property: | index.authority.ignore[.field] | ||
Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata doesn't include authority information. The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. | ||
Property: | index.authority.ignore-prefered[.field] | ||
Example Value: | |||
Informational Note: | By default, Discovery will include all of the DSpace metadata in its search index. In cases where specific metadata is confidential, repository managers can include those fields by adding them to this comma separated list. | ||
Property: | index.authority.ignore[.field] | ||
Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to disambiguate homonyms use the authority information in the metadata to query the authority for the prefered label. Setting this property to false will make the indexing process the same as the metadata doesn't include authority information (i. e. the prefered form is the one recorded in the metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance. | ||
Property: | index.authority.ignore-prefered | Property: | index.authority.ignore-variants[.field] |
Example Example Value: |
| ||
Informational Note: | By default, Discovery will use the authority information in the metadata to query the authority for variantsthe prefered label. Setting this property to false will make the indexing process the same ,as the metadata doesn't include authority information (i.e. the prefered form is the one recorded in the metadata value). The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If the authority is a remote service, disabling this feature can greatly improve performance. |
Modifying the Discovery User Interface (config/spring/api/discovery.xml
)
The discovery.xml
file is located in the [dspace-install-dir]/config/spring/api
directory.
Structure Summary
This file is in XML format, you should be familiar with XML before editing this file. The configurations are organized together in beans, depending on the purpose these properties are used for.
This purpose can be derived from the class of the beans. Here's a short summary of classes you will encounter throughout the file and what the corresponding properties in the bean are used for.
Download the configuration file and review it together with the following parameters
Property: | index.authority.ignore-variants[.field] |
Example Value: |
|
Informational Note: | By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as the metadata doesn't include authority information. The configuration can be different on a per-field (<schema>.<element>.<qualifier>) basis, the property without field set the default value. If authority is a remote service, disabling this feature can greatly improve performance. |
Modifying the Discovery User Interface (config/spring/api/discovery.xml
)
The discovery.xml
file is located in the [dspace-install-dir]/config/spring/api
directory.
Structure Summary
This file is in XML format, you should be familiar with XML before editing this file. The configurations are organized together in beans, depending on the purpose these properties are used for.
This purpose can be derived from the class of the beans. Here's a short summary of classes you will encounter throughout the file and what the corresponding properties in the bean are used for.
Download the configuration file and review it together with the following parameters
Class: | DiscoveryConfigurationService |
Purpose: | Defines the mapping between separate Discovery configurations and individual collections/communities |
Default: | All communities, collections and the homepage (key=default) are mapped to defaultConfiguration |
Class: | DiscoveryConfiguration |
Purpose: | Groups configurations for sidebar facets |
Class: | DiscoveryConfigurationService |
Purpose: | Defines the mapping between separate Discovery configurations and individual collections/communities |
Default: | All communities, collections and the homepage (key=default) are mapped to defaultConfiguration |
Class: | DiscoveryConfiguration |
Purpose: | Groups configurations for sidebar facets, search filters, search sort options and recent submissions |
Default: | There is one configuration by default called defaultConfiguration |
Class: | DiscoverySearchFilter |
Purpose: | Defines that specific metadata fields should be enabled as a search filter |
Default: | dc.title, dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued are defined as search filters |
Class: | DiscoverySearchFilterFacet |
Purpose: | Defines which metadata fields should be offered as a contextual sidebar browse options, each of these facets has also got to be a search filter |
Default: | dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued |
Class: | HierarchicalSidebarFacetConfiguration |
Purpose: | Defines which metadata fields contain hierarchical data and should be offered as a contextual sidebar option |
Class: | DiscoverySortConfiguration |
Purpose: | Further specifies the sort options to which a DiscoveryConfiguration refers |
Default: | dc.title and dc.date.issued are defined as alternatives for sorting, other than Relevance (hard-coded) |
Class: | DiscoveryHitHighlightingConfiguration |
Purpose: | Defines which metadata fields can contain hit highlighting & search snippets |
Default: | dc.title, dc.contributor.author, dc.subject, dc.description.abstract & full text from text files. |
...
- search.resourcetype:2
- dc.subject:test
- dc.contributor.author: "Van de Velde, Kevin"
- ...
Access
...
Rights Awareness
By default, when searching and browsing using Discovery, you will only see items that you have access to. So, your search/browse The items returned by discovery are all the items the user logged in has access to. So the results may differ if you are logged ininto DSpace. This feature can be switched off it isn't requested by going to the [dspace.dir]/config/spring/api/discovery.xml file & commenting out the bean & the alias shown below.
Code Block | ||
---|---|---|
| ||
<bean class="org.dspace.discovery.SolrServiceResourceRestrictionPlugin" id="solrServiceResourceIndexPlugin"/>
<alias name="solrServiceResourceIndexPlugin" alias="org.dspace.discovery.SolrServiceResourceRestrictionPlugin"/> |
Warning |
---|
The Browse Engine only supports the "Access item based results" if the Solr/Discovery backend is enabled (see Defining the Storage of the Browse Data) |
Access item based results technical details
Access Rights Awareness feature ensures that anonymous users (and search engines) are not able to access information (both files and metadata) about embargoed or private items. It also provides you with more direct control over who can see individual items within your DSpace.
How does Access Rights Awareness work?
Access Rights Awareness checks the "READ" access on the Item.
If the "Anonymous" group has "READ" access on the Item, then anonymous/public users will be able to view that Item's metadata and locate that Item via DSpace's search/browse system. In addition, search engines will also be able to index that Item's metadata. However, even with Anonymous READ set at the Item-level, you may still choose to access-restrict the downloading/viewing of files within the Item. To do so, you would restrict "READ" access on individual Bitstream(s) attached to the Item.
If the "Anonymous" group does NOT have "READ" access on the Item, then anonymous users will never see that Item appear within their search/browse results (essentially the Item is "invisible" to them). In addition, that Item will be invisible to search engines, so it will never be indexed by them. However, any users who have been given READ access will be able to find/locate the item after logging into DSpace. For example, if a "Staff" group was provided "READ" access on the Item, then members of that "Staff" group would be able to locate the item via search/browse after logging into DSpace.
How can I disable Access Rights Awareness?
If you prefer to allow all access-restricted or embargoed Items to be findable within your DSpace, you can choose to turn off Access Rights Awareness. However, please be aware that this means that restricting "READ" access on an Item will not really do anything – the Item metadata will be available to the public no matter what group(s) were given READ access on that Item.
This feature can be switched off by going to the [dspace.dir]/config/spring/api/discovery.xml
file & commenting out the bean & the alias shown below.
Code Block | ||
---|---|---|
| ||
<bean class="org.dspace.discovery.SolrServiceResourceRestrictionPlugin" id="solrServiceResourceIndexPlugin"/>
<alias name="solrServiceResourceIndexPlugin" alias="org.dspace.discovery.SolrServiceResourceRestrictionPlugin"/> |
Note |
---|
The Browse Engine only supports the "Access Rights Awareness" if the Solr/Discovery backend is enabled (see Defining the Storage of the Browse Data). However, it is enabled by default for DSpace 3.x and above. |
Access Rights Awareness - technical details
The DSpaceObject class has an updateLastModified() method which will be triggered each time an authorization policy changes. This method is only implemented in the item class where the last_modified timestamp will be updated and a modify event will be fired. By doing this we ensure that the discovery consumer is called and the item is reindexed. Since this feature can be switched off a separate plugin has been created: the SolrServiceResourceRestrictionPlugin. Whenever we reindex a DSpace object all the read rights will be stored in the read fieldThe DSpaceObject class has an updateLastModified() method which will be triggered each time an authorization policy changes. This method is only implemented in the item class where the last_modified timestamp will be updated and a modify event will be fired. By doing this we ensure that the discovery consumer is called and the item is reindexed. Since this feature can be switched off a separate plugin has been created: the SolrServiceResourceRestrictionPlugin. Whenever we reindex a DSpace object all the read rights will be stored in the read field. We make a distinction between groups and users by adding a 'g' prefix for groups and the 'e' prefix for epersons.
...
Code Block | ||
---|---|---|
| ||
<property name="moreLikeThisConfiguration"> <bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration"> <property name="similarityMetadataFields"> <list> <value>dc.title</value> <value>dc.contributor.author</value> <value>dc.creator</value> <value>dc.subject</value> </list> </property> <!--The minimum number of matching terms across the metadata fields above before an item is found as related --> <property name="minTermFrequency" value="5"/> <!--The maximum number of related items displayed--> <property name="minTermFrequencymax" value="53"/> <!--The maximum number of related items displayed minimum word length below which words will be ignored--> <property name="maxminWordLength" value="35"/> <!--The minimum word length below which words will be ignored--> <property name="minWordLength" value="5"/> </bean> </property> |
The property name & the bean class are mandatory. The property field names are discussed below.
- similarityMetadataFields: the metadata fields checked for similarity
- minTermFrequency: The minimum number of matching terms accross the metadata fields above before an item is found as related
- max: The maximum number of related items displayed
- minWordLength: The minimum word length below which words will be ignored
"More like this" technical details
</bean>
</property> |
The property name & the bean class are mandatory. The property field names are discussed below.
- similarityMetadataFields: the metadata fields checked for similarity
- minTermFrequency: The minimum number of matching terms accross the metadata fields above before an item is found as related
- max: The maximum number of related items displayed
- minWordLength: The minimum word length below which words will be ignored
"More like this" technical details
The org.dspace.discovery.SearchService object has received a getRelatedItems() method. This method requires an item & the more-like-this configuration bean from above. This method is implemented in the org.dspace.discovery.SolrServiceImpl which uses the item as a query & uses the default Solr parameters for more-like-this to pass the bean configuration to solr (https://cwiki.apache.org/confluence/display/solr/MoreLikeThis). The result will be a list of items or if none found an empty list. The rendering of this list is handled in the org.dspace.app.xmlui.aspect.discovery.RelatedItems class.
"Did you mean" spellcheck aid for search configuration
DSpace 4 introduces the use of SOLR's SpellCheckComponent as an aid for search. When a user's search does not return any hits, the user is presented with a suggestion for an alternative search query.
The feature currently only one line of configuration to discovery.xml. Changing the value from true to false will disable the feature.
Code Block | ||
---|---|---|
| ||
<property name="spellCheckEnabled" value="true" /> |
"Did you mean" spellcheck aid for search technical details
Similar to the More like this configuration, SOLR's spell check component is used with default configuration values. Any of these values can be overridden in the solrconfig.xml file located in dspace/solr/search/conf/. Following links provide more information about the SOLR SpellCheckComponent:
The org.dspace.discovery.SearchService object has received a getRelatedItems() method. This method requires an item & the more-like-this configuration bean from above. This method is implemented in the org.dspace.discovery.SolrServiceImpl which uses the item as a query & uses the default Solr parameters for more-like-this to pass the bean configuration to solr (http://wiki.apache.org/solr/MoreLikeThis). The result will be a list of items or if none found an empty list. The rendering of this list is handled in the org.dspace.app.xmlui.aspect.discovery.RelatedItems class.SpellCheckComponent
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
Discovery Solr Index Maintenance
Command used: |
|
Java class: | org.dspace.discovery.IndexClient |
Arguments (short and long forms): | Description |
| called without any options, will update/clean an existing index |
| (re)build index, wiping out current one if it exists |
| clean existing index removing any documents that no longer exist in the db |
| if updating existing index, force each handle to be reindexed even if uptodate |
| print this help message |
| optimize search core |
| remove an Item, Collection or Community from index based on its handle |
-s | Rebuild the spellchecker, can be combined with -b and -f. |
Routine Discovery Solr Index Maintenance
...