The replication service talks to the ingest-server and will pull data when it is ready to the local preservation area. It then registers the token store and collection with ACE, completing ingestion of content into chronopolis.
Installation
Prereqs
- ACE - The replication shell will need to talk to ACE in order to register collections
- SSH + Rsync - Replication is done with rsync over ssh, so keys must me exchanged to any nodes planning on distributing content so that file transfers can occur.
- Preservation Area - The replication shell pulls from the ingest server directly to your preservation area on a posix file system. In the future we may expand this, for now it's local disk.
RPM
The replication shell is packaged in an rpm for ease of install
- Download the rpm from our build server
- Use yum to install the rpm: `yum install replication-shell-$version.rpm`
Installed files are as follows
Code Block |
---|
language | bash |
---|
title | EL6 Replication Files |
---|
collapse | true |
---|
|
[noarch] $ rpm -qlp replicationd-2.0.0-20171027.el6.noarch.rpm
/etc/init.d/replicationd
/usr/local/chronopolis/replication
/usr/local/chronopolis/replication/application.yml
/usr/local/chronopolis/replication/replicationd.jar
/var/log/chronopolis
|
Code Block |
---|
language | bash |
---|
title | EL7 Replication Files |
---|
collapse | true |
---|
|
[noarch] $ rpm -qlp replicationd-2.0.0-20171027.el7.noarch.rpm
/usr/lib/systemd/system/replicationd.service
/usr/local/chronopolis/replication
/usr/local/chronopolis/replication/application.yml
/usr/local/chronopolis/replication/replicationd.jar
/var/log/chronopolis
|
User Creation
A service user is also needed as part of the install process who can write to /var/log/chronopolis and the preservation storage defined in the configuration. This is no longer handled by the rpm installation process and must be done manually. By default the init scripts will look for a 'chronopolis' user, and if it is not found fail. These can be updated in the following places:
Code Block |
---|
language | bash |
---|
title | /etc/init.d/replicationd |
---|
collapse | true |
---|
|
18 # User to execute as
19 CHRON_USER="chronopolis" |
Code Block |
---|
language | bash |
---|
title | /usr/lib/systemd/system/replicationd.service |
---|
collapse | true |
---|
|
7 User=chronopolis
8 Group=chronopolis |
Configuration and Running
The replication shell reads the configuration file in /usr/local/chronopolis/replication/applicayion.yml
Code Block |
---|
language | text |
---|
title | application.yml |
---|
collapse | true |
---|
|
# Replication Configuration Properties
# Replication Service Configuration
# node: the name to use when sending notification messages
# send-on-success: flag to enable sending notification on successful replications
chron:
node: chron
smtp.send-on-success: true
# ACE-AM Configuration
# am: the endpoint of the Audit Manager application
# username: the username to connect to the Audit Manager with
# password: the password to connect to the Audit Manager with
ace:
am: http://localhost:8080/ace-am/
username: user
password: change-me
# Ingest API Configuration
# endpoint: the endpoint of the Ingest Server
# username: the username to connect to the Ingest Server with
# password: the password tot connect to the Ingest Server with
ingest.api:
endpoint: https://localhost:8080/ingest/
username: ingest-user
password: change-me
# Preservation Storage Configuration: Only posix supported at this time
# posix: a list of Storage Filesystems available
# id: the id of the Storage Filesystem (optional for replication - Storage doesn't need to be registered with the Ingest Server)
# path: the path on disk to the Storage FS
storage.preservation:
posix:
- id: 1
path: /export/bags/
- id: 2
path: /export/more-bags/
# Replication Cron Job Configuration
# The rate at which to poll the ingest server for replications
replication.cron: 0 0 * * * *
# Various Configuration Properties
# timeout: the timeout in Minutes for HTTP communication with the Audit Manager
ace.timeout: 5
# SMTP Configuration
smtp:
send: true
to: chron-support-l@mailman.ucsd.edu
from: localhost
host: localhost.localdomain
# Specify the active profile for loading various services, normally production
spring.profiles.active: production
spring.pid.file: /var/run/replicationd.pid
# Logging properties
logging.file: replication.log
logging.level:
org.springframework: ERROR
org.hibernate: ERROR
org.chronopolis: DEBUG |
Notes on Configuration
- The replication.cron timer sets how often the replication-shell queries the ingest-server for active replications. It uses a cron style formatting:
- The development profile can be used for testing configuration options. It remains in the foreground and has a limited set of commands.
- Currently multiple ingest servers will not be queried, only the first one on the list.
- The mail configuration (smtp) is set to send by default.
- If you don't want to send mail, or have a server which does not have smtp capabilities you can turn it off by setting smtp.send: false
- If the storage is not set up correctly, the replicationd process will not run. This includes existence, and r/w/e permissions
- The pid file likely should not be changed. If you do update it make sure to update the init scripts which rely on it as well.
Running
EL 6
Using the init.d script: service replicationd start
EL 7
Using systemd: systemd start replicationd
Errors
This will be filled out as we experience problems. Check /var/log/chronopolis/replication.log to see if there are any stack traces.
Upgrading
Upgrading can be done by downloading a newer version of the package and issuing a yum upgrade
Current Implementation
- Query Ingest RESTful API for available transfers
- Transfer data (bag + tokens) to local preservation area
- Register to ACE
- Load tokens for collection in ACE
- Issue audit for collection in ACE
- Update Ingest API with the status of the transfer