Table of Contents | ||
---|---|---|
|
Excerpt |
---|
Solr in DSpace
What is Solr: http://lucene.apache.org/solr/features.html
...
Please, note, that to get data from Solr, you don't technically need to enable the Discovery aspect, but you do need to populate the index. The statistics core is populated automatically in DSpace 1.6+. To populate the search core (DSpace 1.7+), you need to run [dspace]/bin/dspace updateindex-discovery-index
(you will probably want to schedule it in cron to run periodically, too). In DSpace versions older than 4.x, the command was called [dspace]/bin/dspace update-discovery-index
. There should be no reason to access the oai core (DSpace 3.0), because it contains the same information as the search core, but if you want to populate it, run [dspace]/bin/dspace oai import
.
...
Warning |
---|
Before you try to follow the advice below to bypass the localhost restriction, please note:
|
...
Bypassing localhost restriction temporarily
...
Code Block |
---|
http://localhost:8080/solr/search/select?q=search.resourcetype:2&sort=dc.date.accessioned_dt%20desc |
Note:
search.resourcetype:2 | items |
search.resourcetype:3 | communities |
search.resourcetype:4 | collections |
To get only the first (newest) item (rows=1) with all but the date accessioned field filtered out (fl=dc.date.accessioned) and without the Solr response header (omitHeader=true):
...
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
|
Note:
facet.field=epersonid | You want to group by epersonid, which is the user id |
type:0 | Interested in bitstreams only |
Number of items in a specific community
Community here is specified by its "community_id
" - the identifier from the "community
" table in in database. The result is the "numFound
" attribute of the "result
" element. This example returns number of items (search.resourcetype:2
) in community with community_id=85 (location.comm:85
):
Code Block |
---|
http://localhost:8080/solr/search/select/?q=location.comm:85+AND+search.resourcetype:2&version=2.2&start=0&rows=0&indent=on.comm:85+AND+search.resourcetype:2&start=0&rows=0&indent=on |
Breakdown of submitted items per month
Show breakdown of items (search.resourcetype:2
) submitted (facet.date=dc.date.accessioned_dt
) per month (facet.date.gap=+MONTH) in the year 2016 (facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z
):
Code Block |
---|
http://localhost:8080/solr/search/select?indent=on&rows=0&facet=true&facet.date=dc.date.accessioned_dt&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=search.resourcetype:2 |
Statistics breakdown per event type
Starting from DSpace 3, there is a statistics_type
field in the statistics
core that contains the "usage event type". Currently, the available types are search
, view
, search_result
and workflow
. Here's how to get event breakdown by type, excluding robots (isBot:false
):
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.field=statistics_type&q=isBot:false |
Statistics: breakdown of downloads per month
Show breakdown of bitstream (type:0
) downloads per month in the year 2016, excluding robots (isBot:false
):
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.date=time&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=type:0+AND+isBot:false |
Statistics: number of downloads (item views) for a specific item per month
Show bitstream (type:0
) downloads per month in the year 2016, excluding robots (isBot:false
), for a specific item (2163 in the example):
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&rows=0&facet=true&facet.date=time&facet.date.start=2016-01-01T00:00:00Z&facet.date.end=2017-01-01T00:00:00Z&facet.date.gap=%2B1MONTH&q=type:0+owningItem:2163&fq=-isBot:true&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view) |
Statistics: number of total downloads in a given time span
Show the total repository-wide bitstream (type:0
) downloads, excluding robots (isBot:false
), for a specific duration (September 1 2017 through September 1 2018). No need for faceting to get a total count:
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&rows=0&q=time:[2017-09-01T00:00:00Z+TO+2018-09-01T00:00:00Z]+AND+type:0+AND+isBot:false |
Querying Solr from XMLUI
Since Solr returns its responses in XML, it's possible and easy to call custom Solr queries from XMLUI, process the XML response with XSLT and display the results in human-readable form on the HTML page.
There are two ways how to do that - synchronously in Cocoon or asynchronously using AJAX (JavaScript) after the page is loaded. Solr queries are usually very fast, so only synchronous calls will be shown here.
...
Code Block | ||
---|---|---|
| ||
http://localhost:8080/solr/search/select/?q=*:*&fq={!join from=owningItem to=search.resourceid fromIndex=statistics}title:"Testing title" |
"AND" search as default
Up to and including DSpace 5 (see DS-2809), Discovery uses the "OR" operator as default if you don't specify an operator between your query keywords. So searching for "John Doe" will also return entries like "Jane Doe" and "John Connor". If you want to change that, you have to edit the schema.xml
file of the Solr search
core:
...
If for whatever reason you need to delete the data in your index (which would normally be followed by running [dspace]/bin/dspace index-discovery
(in DSpace versions older than 4.x, it was called [dspace]/bin/dspace update-discovery-index
), but you can use the -b parameter instead to reindex everything), here's how you can do it:
...
It should also be possible to use it in other versions of DSpace (starting from 1.6), but these use different versions of Solr, so modify the procedure accordingly (and expect other caveats):
DSpace 6 | Solr 4.10.2 |
DSpace 5 | Solr 4.10.2 |
DSpace 4 | Solr 4.4.0 |
DSpace 3 | Solr 3.5.0 |
DSpace 1.8 | Solr 3.3.0 |
DSpace 1.7 | Solr 1.4.1 |
DSpace 1.6 | Solr 1.3.0 |
Note: In older versions, you may need to specify the queryResponseWriter class as org.apache.solr.request.VelocityResponseWriter
(I haven't tested it, though)
...
- Discovery Official DSpace 3.x documentation
- DSpace Discovery Discovery proposal & purpose, intro video, Discovery 1.8 changes & configuration
- DSpace Discovery HowTo Discovery screenshots (before Discovery was included in DSpace), most content obsolete (pre-1.7.0)
See also:
- Solr Tutorial
- ajax-solr, a JavaScript library for creating user interfaces to Solr.
- /var/log/tomcat6/catalina.out