Page History

...

Warning
Please note, that as of DSpace 4.0, the Solr-based Discovery search is on by the default in both JSPUI and XMLUI. If you want customize the search behavior in a normal DSpace you should refer to the Discovery documentation.

Configuring Lucene Search Indexes

Search indexes can be configured and customized easily in the dspace.cfg file. This allows institutions to choose which DSpace metadata fields are indexed by Lucene.

Property:	`search.dir`
Example Value:	`search.dir = ${dspace.dir}/search`
Informational Note:	Where to put the search index files
Property:	`search.max-clauses`
Example Value:	`search.max-clauses = 2048`
Informational Note:	By setting higher values of search.max-clauses will enable prefix searches to work on larger repositories.
Property:	`search.index.delay`
Example Value:	`search.index.delay = 5000`
Informational Note:	It is possible to create a 'delayed index flusher'. If a web application pushes multiple search requests (i.e. a barrage or sword deposits, or multiple quick edits in the user interface), then this will combine them into a single index update. You

...

set the property key to the number of milliseconds to wait for an update. The example value will hold a Lucene update in a queue for up to 5 seconds. After 5 seconds all waiting updates will be written to the Lucene index.
Property:	`search.analyzer`
Example Value:	`search.analyzer = org.dspace.search.DSAnalyzer`
Informational Note:	Which Lucene Analyzer implementation to use. If this is omitted or commented out, the standard DSpace analyzer (designed for English) is used by default. This standard DSpace analyzer removes common stopwords, lowercases all words and performs stemming (removing common word endings, like "ing", "s", etc).
Property:	`search.analyzer`
Example Value:	`search.analyzer = org.dspace.search.DSNonStemmingAnalyzer`
Informational Note:	Instead of the standard DSpace Analyzer (DSAnalyzer), use an analyzer which doesn't "stem" words/terms. When using this analyzer, a search for "wellness" will always return items matching "wellness" and not "well". However, similarly a search for "experiments" will only return objects matching "experiments" and not "experiment" or "experimenting". When using this analyzer, you may still use WildCard searches like "experiment*" to match the beginning of words.
Property:	`search.analyzer`
Example Value:	`search.analyzer = org.apache.lucene.analysis.cn.ChineseAnalyzer`
Informational Note:	Instead of the standard English analyzer, the Chinese analyzer is used.
Property:	`search.operator`
Example Value:	`search.operator = OR`
Informational Note	Boolean search operator to use. The currently supported values are OR and AND. If this configuration item is missing or commented out, OR is used. AND requires all the search terms to be present. OR requires one or more search terms to be present.
Property:	`search.maxfieldlength`
Example Value:	`search.maxfieldlength = 10000`
Informational Note:	This is the maximum number of terms indexed for a single field in Lucene. The default is 10,000 words‚ often not enough for full-text indexing. If you change this, you will need to re-index for the change to take effect on previously added items. -1 = unlimited (Integer.MAG_VALUE)
Property:	`search.index.` `n`
Example Value:	`search.index.1 = author:dc.contributor.*`
Informational Note	This property determines which of the metadata fields are being indexed for search. As an example, if you do not include the title field here, searching for a word in the title will not be matched with the titles of your items..

For example, the following entries appear in the default DSpace installation:
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator.*
search.index.3 = title:dc.title.*
search.index.4 = keyword:dc.subject.*
search.index.5 = abstract:dc.description.abstract
search.index.6 = author:dc.description.statementofresponsibility
search.index.7 = series:dc.relation.ispartofseries
search.index.8 = abstract:dc.description.tableofcontents
search.index.9 = mime:dc.format.mimetype
search.index.10 = sponsor:dc.description.sponsorship
search.index.11 = id:dc.identifier.*
search.index.12 = language:dc.language.iso

The format of each entry is search.index.<id> = <search index name> : <schema> . <metadata field>[:index type] where:

`<id>`	is an incremental number to distinguish each search index entry
`<search index name>`	is the identifier for the search field this index will correspond to
`<schema>`	is the schema used. Dublin Core (DC) is the default. Others are possible.
`<metadata field>`	is the DSpace metadata field to be indexed.
`<index type>`	can be used to specify how manipulate the values before indexing. Example: search.index.12 = language:dc.language.iso:inputform Possible values are: text - default, no special treatment. Metadata value are passed to lucene as text timestamp - the values are interpreted as date with second granularity. An additional index postfixed with .year is created with year granularity date - the values are interpreted as date with day granularity. An additional index postfixed with .year is created with year granularity inputform - in addition to the values stored in the metadata the displayed form of this value as derivable from the input-form (in any of the available languages) are stored

In the example above, search.index.1 and search.index.2 and search.index.3 are configured as the author search field. The author index is created by Lucene indexing all dc.contributor.*,dc.creator.* and description.statementofresponsibility metadata fields.

After changing the configuration run /[dspace]/bin/dspace index-init to regenerate the indexes.

While the indexes are created, this only affects the search results and has no effect on the search components of the user interface.

In the above examples, notice the asterisk (*). The metadata field (at least for Dublin Core) is made up of the "element" and the "qualifier". The asterisk is used as the "wildcard". So, for example, keyword.dc.subject.* will index all subjects regardless if the term resides in a qualified field. (subject versus subject.lcsh). One could customize the search and only index LCSH (Library of Congress Subject Headings) with the following entry keyword:dc.subject.lcsh instead ofkeyword:dc.subject.*

Authority Control Note:

Although DSIndexer automatically builds a separate index for the authority keys of any index that contains authority-controlled metadata fields, the "Advanced Search" UIs does not allow direct access to it. Perhaps it will be added in the future. Fortunately, the OpenSearch API lets you submit a query directly to the Lucene search engine, and this may include the authority-controlled indexes.

Customize the advanced search form

As the previous configuration apply only to the indexing and querying phase one will need to customize the user interface to reflect the changes, for example, to add the a new search category to the Advanced Search.

XML UI requires manual coding of the involved templates instead the JSP UI provides specific configuration to set the index to show in the advanced search dropdown. Below are listed the configuration parameters

Property:	`jspui.search.index.display.<n>`
Example Value	jspui.search.index.display.1 = ANY
Informational Note:	Set the N-value of the index dropdown in the advanced search form. The value must match one of the defined index

All Versions

DSpace Documentation

Page tree

Versions Compared

Old Version 6

New Version 7

Key

Configuring Lucene Search Indexes

Customize the advanced search form