Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The script can be executed through the DSpace command-line interface (the script is also available from the UI but requires administrative permission to run the script):

Parameters

./dspace solr-core-management
         -m <mode:{export|import}>
         -c <core:{audit|statistics|...}>
         -d <directory>
         [-f <format:{csv|json}>]
         [-t <threads:integer>=1]
         [-s <start-date:yyyy-MM-dd>]
         [-e <end-date:yyyy-MM-dd>]
         [-i <increment:{WEEK|MONTH|YEAR}>]
         [-h]


Parameter Description

 

ParameterRequiredDescription
-m, --modeOperation mode: either export or import.
-c, --coreName of the Solr core to manage (e.g., statistics, authority, audit).
-d, --directoryDirectory where exported data will be stored or imported from.
-f, --format
File format for export/import. Supported formats: csv (default) or json.
-t, --threads
Number of threads used for parallel processing (default: 1).
-s, --start-date
Start date (in yyyy-MM-dd format) for time-based filtering during export.
-e, --end-date
End date (in yyyy-MM-dd format) for time-based filtering during export.
-i, --increment
Split the export into time-based chunks: WEEK, MONTH, or YEAR. Useful for very large datasets.
(default is MONTH)
-h, --help
Displays help and usage information.


...

Examples

Export Example

./dspace solr-core-management --mode export --core audit --directory /tmp/export --format csv --threads 4 --increment WEEK

This command exports the content of the audit core into the directory /tmp/export, splitting data by weekly increments.
The export is performed in CSV format, using 4 parallel threads for faster processing.

Incremental export is useful when the Solr core contains a large volume of records — for example, exporting weekly chunks prevents single massive files and allows resuming operations in case of partial failure.

...

Import Example

./dspace solr-core-management --mode import --core audit --directory /tmp/export --format csv --threads 2

This command imports previously exported data (from /tmp/export) back into the audit Solr core.
It uses 2 threads to parallelize document ingestion and supports the same format used during export (csv or json).

This operation is typically used when:

  • Rebuilding a Solr core after corruption or reindexing purposes.

  • Migrating Solr data between environments (e.g., production → test).

...

Best Practices

  • Better stoping DSpace activities before performing imports to avoid inconsistencies.
    (this also depends on the data where exporting)

  • Run exports with multiple threads when working with large datasets to reduce execution time.
    Be aware that multiple thread execution could lead to an increased workload on the Solr installation.

...

Note

The export process is designed to operate over date ranges rather than a single continuous dataset.
This approach serves two purposes:

  1. It makes data more manageable and modular, allowing administrators to back up or transfer only specific time periods (e.g., weekly or monthly exports).

  2. It avoids the need for deep pagination over very large result sets, which would require Solr to maintain an explicit sort order (sort=<sort-field>), significantly increasing memory usage and query time.

By splitting exports into smaller date-based ranges, the process minimizes Solr load, reduces the risk of timeouts, and ensures that each export segment can be completed efficiently even on heavily populated cores.