Page History
...
We need to find out what, if anything, must be done to prepare existing cores for use by a vastly newer server version.
- The search core might be dropped and the repository re-indexed.
...
- The statistics core could be rebuilt if a site has kept its DSpace logs (or the extracts prepared by
bin/dspace stats-log-converter
). We also have a dump/restore tool. - The authority core contains information that is not easily reproduced, so it may be best to dump and reload it (which may require building or adapting a tool).
- How might the other cores (oai, ???) be migrated, if need be?
This change complicates development and maintenance in cases where one wishes to use the same index content across different versions of DSpace. How can we facilitate this?
...
- If multiple shards are already in use, how should those be migrated into the new version of Solr.?
- As a Solr instance grows (specifically statistics), what scaling options exist? If Solr Cloud is the solution, how difficult will it be to make that migration later?
...
Recent Solr provides APIs for schema management. We may want to make use of them. It's been suggested that we could use this for probing the condition of the required cores, and even for future schema updates.
Other issues
We may want to begin work with the "search" core, which should be simplest to work with.
This writer thinks that we should not try to give comprehensive instruction in setting up Solr.
We have existing code for discovering the version of a Solr instance and running upgrades provided by newer Solr versions, which could be adapted. https://github.com/DSpace/DSpace/blob/master/dspace/src/main/config/build.xml#L951
Sharding
DSpace optionally uses sharding to limit the size of the statistics core(s). From Slack discussion, 28-Nov-2018:
Terry Brady 10:19
I see the following use cases
- One DSpace 6 stats shard
- Multiple DSpace 6 stats shards (uuid migration complete)
- One DSpace 5 stats shard (unmigrated)
- Multiple DSpace 5 stats shards (unmigrated)
- No existing cores (new install)
How do we deal with this?
CLI tools
We need to consider our existing tools related to Solr, including:
...
Before plunging into work to make DSpace use the Solr APIs to manage its cores: What's the Simplest Thing That Could Work? We could simply document where to find the current core configuration configurations in DSpace, and instruct the installer to copy them to a place where Solr will find and use them. We could provide some general hints about how to find the destination of these files. Besides being simple, this handles the case in which the people who run DSpace and the people who run Solr are not the same people and issues of access rights ensue.
TODO (not final)
- Complete upgrade of client code to SolrJ 7_x.
- Remove the dspace-solr artifact.
- Work out manual steps for installing empty cores in a free-standing Solr (for a new installation).
- See what manual steps can be moved into Ant's
fresh_install
scripts. - Determine whether schema updates are required.
- Create dump/restore or migration tools for indexes which cannot be recreated (statistics, authority).
- Work out manual steps for copying/migrating/recreating cores with index records into a free-standing Solr.
- See what manual steps can be moved into Ant's
update
scripts. This is only for transition from our outdateddspace-solr
artifact to current stock Solr. - Document the changes to DSpace fresh installation: set up Solr separately if you don't already have it, install cores.
- Document the process for moving existing indexes to free-standing Solr during a DSpace upgrade from 6_x.
Solr Deployment Options
Option | DSpace version | Repo content | Features | Installation Process | Migration Process | Schema Update Process | Management | Notes |
---|---|---|---|---|---|---|---|---|
Deploy Solr as Docker Image | 7.preview | New cores only | single server | Core created on container startup Core persisted in docker volumes | N/A | None. A fresh install is required. | N/A | |
Standalone Solr | 7.preview | New cores only | single server | Ant fresh install script needed | N/A | None. Schema update will not be supported until 7.0 | DSpace sysadmin | |
7.0 | New cores Migrated cores No shards | single server | Ant fresh install script needed Auto detection of existing core needed | Migration script needed for statistics and authority. Does this run as part of the install process or is this a maintenance script? Is this a migration process or an import process? | Manually deploy schema updates to Solr. | DSpace sysadmin | ||
8.0+ | New cores Migrated cores No shards | single server | TBD. Note future configuration options. | |||||
Solr Cloud | 7.0 | New cores Migrated cores "Time Routed Aliases" instead of shards | single or multi server | DBA creates cores and installs schemas | Migration script needed for statistics and authority. Does this run as part of the install process or is this a maintenance script? | DBA manually deploys schema updates to Solr. | DBA | |
8.0+ | New cores Migrated cores "Time Routed Aliases" instead of shards | single or multi server | TBD. Note future configuration options. |
Note that there may be reason to run a "degenerate" SolrCloud on a single server. Some APIs are supported only in cloud mode.
Related Tickets and Pull Requests
...