References
- Introduction (youtube)
Terminology
Term | Definition | Comments |
---|---|---|
Solr Instance |
| |
Solr Core |
| |
Document |
| |
Field |
| |
Facet |
Indexing Documents
- index via...
- Request Handlers & Update Handlers (via HTTP POST/PUT)
- default: XML, Binary, JSON, CVS, etc.
- can define own handlers in config
- Index Handlers
- import from databases
- Solr Cell framework (???)
- custom Java application to ingest data through Solr's Java Client and other apps
- Request Handlers & Update Handlers (via HTTP POST/PUT)
- update processors
- signature
- logging
- indexing
Request Handlers
<!-- solr.SearchHandler --> <requestHandler name="standard" class="solr.SearchHandler"> <!-- /select --> <requestHandler name="search" class="solr.SearchHandler" default="true"> <requestHandler name="permissions" class="solr.SearchHandler" > <requestHandler name="document" class="solr.SearchHandler" > <!-- solr.UpdateRequestHandler --> <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <!-- other handlers --> <requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" /> <requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" /> <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" /> <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
To see what a requestHandler returns, change the value of qt from /select to the name of the handler in the solr admin Query page (https://cals-che-repo-dev.library.cornell.edu/solr/#/development/query). NOTE: You will need to change the host to your solr admin host and may need to change the name of the core from development to the name or your core.
Querying
- receive XML, JSON, CSV, or binary (via HTTP GET)
- request handlers (via HTTP GET)
- default: /admin, /select, /spell
- can define own handlers in config
- search components
- query
- spelling
- faceting
- highlighting
- statistics
- debug
- clustering
search process (see Common Query Parameters)
description default example qt selects Request Handler for a query using /select DisMaxRequestHandler defType selects a Query Parser for the query parser configured in Request Handler q field_name:field_value with * as wildcard to search for *:* q=title:*Archery*
fq filters query by applying an additional query to the initial query's results, caches the results (same syntax as q) *:* fq=popularity:[10TO*]& fq=section:0
sort sort field score desc start an offset into the query results where the returned response should begin 0 start=0
rows the number of rows to be displayed at one time 10 rows=20
fl fields to return in result all fl=id, name
df default field name (I think) that indicates field to serch all indexed fields df=description
wt selects a Response Writer for formatting the query response xml | json wt=json
qf list of fields and the "boosts" to associate with each of them when building DisjunctionMaxQueries (see also SOLR df and qf explanation) all indexed fields are required (???) qf=title^20 description^10
Features
- High Level
- Advanced Full-Text Search
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces - XML, JSON, HTTP
- Comprehensive HTML Admin Interfaces
- Service statistics exposed over JMX for monitoring
- Near Real-time indexing and Adaptable with XML configuration
- Linearly scalable, auto index replication, auto, extensible plugin architecture
- Specific Features
- faceting
- highlighting
- spell checking
- query-re-ranking
- transforming
- suggestors
- more like this
- pagination
- grouping & clustering
- spatial search
- components
- real time (get & update)
- labs
Configuration
- schema.xml
- field types
- etc.
- solrconfig.xml
- register Request Handlers for querying the index
- register Update Handlers for indexing documents
- register Event Handlers for searcher events (e.g. queries to execute to warm new searches)
- activate version-dependent features in Lucene
- Lib directives indicates where Solr can find JAR files for extensions
- Index management settings
- Enable JMX instrumentation of Solr MBeans
- Cache-management settings
- solr.xml
- core.properties
Fields
Defined in schema.xml
Hydra Types:
defined by <types><fieldType>...</></>
- t - text (tokenized)
- te - english text (tokenized)
- s - string
- i - integer; it - trie integer
- f - float; ft - trie float
- l - long; lt - trie float
- d - double; dt - trie double
- b - boolean
- dt - date; dtt - trie date
- ll - location; _coordinate - trie double to index lat and long of a location with indexed=true/stored=false
NOTE: letter indicates the postfix indicator that sets the type for Hydra dynamic fields. Ex. name_tsi means that name has type="text"
Hydra Field Def Parameters:
defined by <fields><dynamicField>...</></>
- s - stored="true|false" - if true, value is returned in solr document
- i - indexed="true|false" - if true, value is searchable
- m - multiValued="true|false" - if true, can have multiple values
- v - termVectors="true|false" - ???
- v - termPosition="true|false" - ???
- v - termOffsets="true|false" = ???
NOTE: letter indicates the postfix indicator that sets that to true for Hydra dynamic fields. Ex. name_tsi means that name has stored=true/indexed="true"
Examples for values of stored and indexed:
stored="true" indexed="false"
- destination URL
- file system path
- time stamp
- icon image
- sort string - have a name that is tokenized text with stored=false/indexed=true and this field that is the exact string for sorting
stored="false" indexed="true"
- bag of words - want to be able to search for all terms in the bag, but don't want them in the solr document search results
- common misspellings - allow common misspellings to match in search, but don't include in solr document search results
indexed="false" stored="false"
Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default.
<fieldtype name="ignored" stored="false" indexed="false" /> <dynamicField name="*" type="ignored" />
Solr Cloud Features
- horizontal scaling (for sharding and replication)
- elastic scaling
- high availability
- distributed indexing
- distributed searching
- central configuration for entire cluster
- automatic load balancing
- automatic failover for queries
- zookeeper integration for coordination & configurations
CRUD
Create
Read
Return all results with search term = "book"
http://localhost:8983/solr/#/development/select?q=book
Update
Delete
NOTE: Examples use stream.body to show how to do this through a URL. Usually done via HTTP POST.
http://localhost:8983/solr/#/development/update?stream.body=<delete><id>SOLR1000</id></delete> http://localhost:8983/solr/#/development/update?stream.body=<commit/>
http://localhost:8983/solr/#/development/update?stream.body=<delete><query>cat:software</query></delete> http://localhost:8983/solr/#/development/update?stream.body=<commit/>
Steps to delete all via Solr Admin UI
- In Solr UI, select core to effect from selection box on left side menu
- select Documents on left side menu
- set Document Type = XML
- set Doucment(s) text area to `<delete><query>*:*</query></delete>`
- leave commit within and overwrite as defaults
- Submit
More Query Examples
Search for a specific field, category, containing a search term, book
http://localhost:8983/solr/#/development/select?q=category:book
Search for price between 0 and 400, inclusive
http://localhost:8983/solr/#/development/select?q=price:[0 TO 400]
Limit search results to return only fields id, name, and price.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price
Return facets for a specific field, category, with counts for each value of category based on the search results.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category
Partial Response as relates to returned facet information.
<lst name="facet_counts"> <lst name="facet_queries" /> <lst name="facet_fields"> <lst name="category"> <int name="book">10</int> <int name="video">2</int> <int name="audio">2</int> </lst> </lst> <lst name="facet_dates"/> </lst>
Return facets for a specific field, category, with specific value for category, book, with counts for each value of category based on the search results.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category&fq=category:electronics
Partial Response as relates to returned facet information.
<lst name="facet_counts"> <lst name="facet_queries" /> <lst name="facet_fields"> <lst name="category"> <int name="book">10</int> <int name="video">0</int> <int name="audio">0</int> </lst> </lst> <lst name="facet_dates"/> </lst>
NOTE: Can include multiple filter queries (fq).
NOTE: When filter query is applied, all categories are still listed, but now have 0 for count if they don't include the filtered value.