Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ALL HOW TOs

...


Expand
titleTable of Contents

Table of Contents


...



References



Terminology


TermDefinitionComments
Solr Instance
  • multiple instances can run ('multiple solr instances are running')
  • deploy webapp on multiple servers, each of which is an instance
Solr Core
  • each solr instance can have multiple cores
  • also referred to as Solr Index, or simply Core or Index
  • implemented in a databases
  • generally, each core runs in isolation, but can configure some communication between cores via CoreContainer
Document
  • 0..m documents live in a core
  • basic unit of information
Field
  • 0..m fields live in a document
  • various types:  text, numeric, date, etc.
  • type tells solr how to interpret the field and how it can be queried
  • type: String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for facetting.

  • type: Text typically performs tokenization, and secondary processing (such as lower-casing etc.). Useful for all scenarios when we want to match part of a sentence.

Facet






...


Indexing Documents


  • index via...
    • Request Handlers & Update Handlers (via HTTP POST/PUT)
      • default:  XML, Binary, JSON, CVS, etc.
      • can define own handlers in config
    • Index Handlers
      • import from databases
    • Solr Cell framework (???)
    • custom Java application to ingest data through Solr's Java Client and other apps
  • update processors
    • signature
    • logging
    • indexing




Request Handlers


Code Block
languagenone
<!--  solr.SearchHandler  -->
<requestHandler name="standard" class="solr.SearchHandler">               <!-- /select -->
<requestHandler name="search" class="solr.SearchHandler" default="true">
<requestHandler name="permissions" class="solr.SearchHandler" >
<requestHandler name="document" class="solr.SearchHandler" >

<!--  solr.UpdateRequestHandler  -->
<requestHandler name="/update" class="solr.UpdateRequestHandler"  />

<!--  other handlers  -->
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" />
<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">

...

To see what a requestHandler returns, change the value of qt from /select to the name of the handler in the solr admin Query page (https://cals-che-repo-dev.library.cornell.edu/solr/#/development/query).  NOTE: You will need to change the host to your solr admin host and may need to change the name of the core from development to the name or your core.




...


Querying


  • receive XML, JSON, CSV, or binary (via HTTP GET)
  • request handlers (via HTTP GET)
    • default:  /admin, /select, /spell
    • can define own handlers in config
  • search components
    • query
    • spelling
    • faceting
    • highlighting
    • statistics
    • debug
    • clustering
  • search process  (see Common Query Parameters)


    descriptiondefaultexample
    qtselects Request Handler for a query using /selectDisMaxRequestHandler
    defTypeselects a Query Parser for the queryparser configured in Request Handler
    qfield_name:field_value with * as wildcard to search for*:*q=title:*Archery*
    fqfilters query by applying an additional query to the initial query's results, caches the results (same syntax as q)*:*fq=popularity:[10TO*]& fq=section:0
    sortsort fieldscore desc
    startan offset into the query results where the returned response should begin0start=0
    rowsthe number of rows to be displayed at one time10rows=20
    flfields to return in resultallfl=id, name
    dfdefault field name (I think) that indicates field to serchall indexed fieldsdf=description
    wtselects a Response Writer for formatting the query responsexml | jsonwt=json
    qflist of fields and the "boosts" to associate with each of them when building DisjunctionMaxQueries  (see also SOLR df and qf explanation)all indexed fields are required (???)
    qf=title^20 description^10







...


Features


  • High Level
    • Advanced Full-Text Search
    • Optimized for High Volume Web Traffic
    • Standards Based Open Interfaces - XML, JSON, HTTP
    • Comprehensive HTML Admin Interfaces
    • Service statistics exposed over JMX for monitoring
    • Near Real-time indexing and Adaptable with XML configuration
    • Linearly scalable, auto index replication, auto, extensible plugin architecture
  • Specific Features
    • faceting
    • highlighting
    • spell checking
    • query-re-ranking
    • transforming
    • suggestors
    • more like this
    • pagination
    • grouping & clustering
    • spatial search
    • components
    • real time (get & update)
    • labs




...


Configuration


  • schema.xml
    • field types
    • etc.
  • solrconfig.xml
    • register Request Handlers for querying the index
    • register Update Handlers for indexing documents
    • register Event Handlers for searcher events (e.g. queries to execute to warm new searches)
    • activate version-dependent features in Lucene
    • Lib directives indicates where Solr can find JAR files for extensions
    • Index management settings
    • Enable JMX instrumentation of Solr MBeans
    • Cache-management settings
  • solr.xml
  • core.properties




Fields


Defined in schema.xml


Hydra Types: 


defined by <types><fieldType>...</></>

...

NOTE: letter indicates the postfix indicator that sets the type for Hydra dynamic fields.  Ex. name_tsi means that name has type="text"


Hydra Field Def Parameters:


defined by <fields><dynamicField>...</></>

...

NOTE: letter indicates the postfix indicator that sets that to true for Hydra dynamic fields.  Ex. name_tsi means that name has stored=true/indexed="true"


Examples for values of stored and indexed:


Panel

stored="true" indexed="false"

  • destination URL
  • file system path
  • time stamp
  • icon image
  • sort string - have a name that is tokenized text with stored=false/indexed=true and this field that is the exact string for sorting

...

Panel

indexed="false" stored="false"

  • Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default.

    Code Block
    languagenone
    <fieldtype name="ignored" stored="false" indexed="false" />
    <dynamicField name="*" type="ignored" />







...


Solr Cloud Features


  • horizontal scaling (for sharding and replication)
  • elastic scaling
  • high availability
  • distributed indexing
  • distributed searching
  • central configuration for entire cluster
  • automatic load balancing
  • automatic failover for queries
  • zookeeper integration for coordination & configurations




CRUD


Create




Read


Return all results with search term = "book"

...

Code Block
languagetext
titleQuery for search term
http://localhost:8983/solr/#/development/select?q=book


Update




Delete


NOTE: Examples use stream.body to show how to do this through a URL.  Usually done via HTTP POST.

...

  • In Solr UI, select core to effect from selection box on left side menu
  • select Documents on left side menu
  • set Document Type = XML
  • set Doucment(s) text area to `<delete><query>*:*</query></delete>`
  • leave commit within and overwrite as defaults
  • Submit






...


  • More Query Examples


Search for a specific field, category, containing a search term, book


Code Block
languagetext
titleQuery for search term in a specific field
http://localhost:8983/solr/#/development/select?q=category:book




Search for price between 0 and 400, inclusive


Code Block
languagetext
titleSearch for range of values
http://localhost:8983/solr/#/development/select?q=price:[0 TO 400]




Limit search results to return only fields id, name, and price.


Code Block
languagetext
titleQuery for search term & limit fields returned
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price




Return facets for a specific field, category, with counts for each value of category based on the search results.


Code Block
languagetext
titleQuery for search term & limit fields returned & include facets
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category

...

Code Block
languagetext
titleResponse
<lst name="facet_counts">
  <lst name="facet_queries" />
  <lst name="facet_fields">
    <lst name="category">
      <int name="book">10</int>
      <int name="video">2</int>
      <int name="audio">2</int>
    </lst>
  </lst>
  <lst name="facet_dates"/>
</lst>




Return facets for a specific field, category, with specific value for category, book, with counts for each value of category based on the search results.


Code Block
languagetext
titleQuery for search term & limit fields returned & include facets
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category&fq=category:electronics

...