...
- Postgresql - The ingest service connects to a postgresql database to store information about transfers, bags, and tokens.
- Staging areas - The ingest service needs to see what collections have been staged, and also an area to store tokens during transfer.
- SSH public keys - Needed from each node who will be replicating content
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[~] $ rpm -qlqlp ingest-server-2.0.0-20171027.el6.noarch.rpm /etc/init.d/ingest-server /usr/local/chronopolis/ingest /etcusr/local/chronopolis/ingest/application.properties.yml /usr/local/chronopolis/ingest/ingest-server.jar /var/log/chronopolis |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[~] $ rpm -qlp ingest-server-2.0.0-20171027.el7.noarch.rpm /etc/init.d/ingest-server /usr/liblocal/chronopolis/ingest /usr/local/chronopolis/ingest/application.yml /usr/liblocal/chronopolis/ingest/ingest-server.jar /var/log/chronopolis |
...
A 'chronopolis' user is also created as part of the install process, needed which can write to /var/log/chronopolis . It should also be able to read and write from the staging areasand perform various read and write tasks as needed from the Token Staging Area. This is no longer installed as part of the rpm, but should be managed separately and configured in the ingest-server startup script.
Database Setup
Download the schema from the CI server
...
- Download and untar/unzip the Flyway Command Line Tool
- The Ingest Server currently uses Flyway 4.2.0; if possible the binary for that version should be used
- Edit the conf/flyway.conf
- some properties follow the same pattern as our application properties (connecting to the database)
specify the version which you are creating the baseline (using the MAJOR.MINOR number of the ingest server version)
Code Block language bash title Flyway Configuration Example # # Copyright 2010-2015 Axel Fontaine # ... # Jdbc url to use to connect to the database flyway.url=jdbc:postgresql://localhost/ingest # Fully qualified classname of the jdbc driver (autodetected by default based on flyway.url) # flyway.driver= # User to use to connect to the database (default: <<null>>) flyway.user=chron # Password to use to connect to the database (default: <<null>>) flyway.password=my-postgresql-password ... flyway.baselineVersion=1.5
Use the flyway bash script to update the database
Code Block language bash title Flyway Baseline Migration $ ./flyway baseline
...
The ingest server reads from the /etc/chronopolis/application.properties configuration file:
Code Block | ||||
---|---|---|---|---|
| ||||
# Sample application.properties
## Staging areas
chron.stage.bags=/export/outgoing/bags
chron.stage.tokens=/export/outgoing/tokens
ingest.replication.server=stage.chronopolis.org
ingest.replication.user=chronopolis
## Database Connection
spring.datasource.url=jdbc:postgresql://localhost/ingest
spring.datasource.username=chron
spring.datasource.password=secret-password
### Needed so that we don't try to load the schema/data
spring.datasource.initialize=false
## Specify that we are running production services
spring.profiles.active=production
## SSL Configuration
# server.port = 8443
# server.ssl.key-store = file:/path/to/keystore.jks
# server.ssl.key-store-password = secret
# server.ssl.key-password = another-secret
# Logging
logging.path=/var/log/chronopolis/
logging.file=/var/log/chronopolis/ingest.log
logging.level.org.springframework=ERROR
logging.level.org.hibernate=ERROR
logging.level.org.chronopolis=DEBUG
# SMTP Configuration
# smtp.host=localhost.localdomain
# smtp.to=chron-support@sdsc.edu
# smtp.from=localhost
# smtp.send=false
|
As of version 2.0, we'll be moving to a yaml based configuration. This looks similar to the above, with a few changes being propagated through the various services to get all the properties to be the same.usr/local/chronopolis/ingest/application.yml configuration file:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
# Ingest Configuration Properties # Ingest Cron job properties # tokens: the rate at which to check for bags which have all tokens and can have a Token Store written # request: the rate at which to check for bags which need their initial replications created ingest.cron: tokens: 0 0/10 * * * * request: 0 0/10 * * * # Ingest AJP Settings # enabled: flag to enable an AJP connector # port: the port for the connector to listen on ingest.ajp: enabled: false port: 8009 # The staging area for writing Token Stores. Nonposix support not yet implemented. ## id: The id of the StagingStorage in the Ingest server ## path: The path to the filesystem on disk chron.stage.tokens.posix.id: -1 chron.stage.tokens.posix.path: /dev/null # Database connection # Initialize should be kept false so that the server does not attempt to run a drop/create on the tables spring.datasource: url: jdbc:postgresql://localhost/ingest username: postgres password: dbpass initialize: false # Specify the active profile for loading various services, normally production spring.profiles.active: production spring.pid.file: /var/run/ingest-server.pid # debug: true server.port: 80008080 # Logging properties logging.file: ingest.log logging.level: org.springframework: ERROR org.hibernate: ERROR org.chronopolis: DEBUG |
Notes on
...
Configuratiom
- The ingest.replication properties are used to build the uris for replication. An example, replicating a collection "Scientific_Data" from depositor "ucsd-researchers":
- rsync-bag: chronopolis@stage.chronopolis.org:/export/outgoing/bags/ucsd-researchers/Scientific_Data
- rsync-tokens: chronopolis@stage.chronopolis.org:/export/outgoing/tokens/ucsd-researchers/Scientific_Data-tokens-2015-02-11
- An AJP connector can now be configured with the server, meaning SSL can be served through apache httpd instead of a java keystore
- The pid file should probably not be updated unless you update the init files with any corresponding changes
Running
The ingest server runs as an executable jar. Using the init script allows for starting and stopping of the server as root: `service ingest-server start`
...
As of version 1.4.0, passwords for users are now encoded using bcrypt. In the event a user forgets their password, we will need to reset t reset it for them. As we do not have email notifications or anything of the like setup, for the moment everything will need to be done manually. We will first need to run the password through an encoder, which can be found online. If you aren't sure how many rounds to use, check the database as the information is kept as part of the encoding, i.e. $2a$08 uses 8 rounds; $2a$10 uses 10 rounds.
Then we connect to the database and issue a simple update:
Code Block | ||||
---|---|---|---|---|
| ||||
UPDATE users SET password = '$2a$10$hEYYHV/Fri00RRHjWPIAWuH3NxYpPPjbMU5OsJfH1SAenajQqKjhK' WHERE username = 'user_resetting_password'; |
Node Specific Admin
StorageRegions
With the release of version 2.0.0, StorageRegions have been introduced in order to facilitate distribution of content from many nodes in Chronopolis. Configuration for them is as follows:
- ReplicationConfiguration
- Notes
- ...
...
Open Questions
- How do we handle error’d bags? (hold, reject, ??)
- We bag the packages ourselves, so we should get no bags with errors
- The ingest-server stores a token for each valid digest, and ignores all others. Either manifest is not digested until 100% of the files are valid. Requests are not made until tokens have been created for each file in the bag.
- Do we have a record of all the collections and their states as they move through to replication? We need to be able to retrieve this data, including any failures.