Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Postgresql - The ingest service connects to a postgresql database to store information about transfers, bags, and tokens.
  • Staging areas - The ingest service needs to see what collections have been staged, and also an area to store tokens during transfer.
  • SSH public keys - Needed from each node who will be replicating content

...

Code Block
languagebash
titleEL6 Ingest Files
collapsetrue
[~] $ rpm -qlqlp ingest-server-2.0.0-20171027.el6.noarch.rpm
/etc/init.d/ingest-server
/usr/local/chronopolis/ingest
/etcusr/local/chronopolis/ingest/application.properties.yml
/usr/local/chronopolis/ingest/ingest-server.jar
/var/log/chronopolis


Code Block
languagebash
titleEL7 Ingest Files
collapsetrue
[~] $ rpm -qlp ingest-server-2.0.0-20171027.el7.noarch.rpm
/etc/init.d/ingest-server
/usr/liblocal/chronopolis/ingest
/usr/local/chronopolis/ingest/application.yml
/usr/liblocal/chronopolis/ingest/ingest-server.jar
/var/log/chronopolis

...

A 'chronopolis' user is also created as part of the install process, needed which can write to /var/log/chronopolis . It should also be able to read and write from the staging areasand perform various read and write tasks as needed from the Token Staging Area. This is no longer installed as part of the rpm, but should be managed separately and configured in the ingest-server startup script.

Database Setup

Download the schema from the CI server

...

  1. Download and untar/unzip the Flyway Command Line Tool
    1. The Ingest Server currently uses Flyway 4.2.0; if possible the binary for that version should be used
  2. Edit the conf/flyway.conf
    1. some properties follow the same pattern as our application properties (connecting to the database)
    2. specify the version which you are creating the baseline (using the MAJOR.MINOR number of the ingest server version)

      Code Block
      languagebash
      titleFlyway Configuration Example
      #
      # Copyright 2010-2015 Axel Fontaine
      #
      ...
      
      # Jdbc url to use to connect to the database
      flyway.url=jdbc:postgresql://localhost/ingest
      
      # Fully qualified classname of the jdbc driver (autodetected by default based on flyway.url)
      # flyway.driver=
      
      # User to use to connect to the database (default: <<null>>)
      flyway.user=chron
      
      # Password to use to connect to the database (default: <<null>>)
      flyway.password=my-postgresql-password
      
      ...
      flyway.baselineVersion=1.5


  3. Use the flyway bash script to update the database

    Code Block
    languagebash
    titleFlyway Baseline Migration
    $ ./flyway baseline


...

The ingest server reads from the /etc/chronopolis/application.properties configuration file:

Code Block
titleapplication.properties
collapsetrue
# Sample application.properties

## Staging areas
chron.stage.bags=/export/outgoing/bags
chron.stage.tokens=/export/outgoing/tokens
ingest.replication.server=stage.chronopolis.org
ingest.replication.user=chronopolis

## Database Connection
spring.datasource.url=jdbc:postgresql://localhost/ingest
spring.datasource.username=chron
spring.datasource.password=secret-password

### Needed so that we don't try to load the schema/data
spring.datasource.initialize=false

## Specify that we are running production services
spring.profiles.active=production

## SSL Configuration
# server.port = 8443
# server.ssl.key-store = file:/path/to/keystore.jks
# server.ssl.key-store-password = secret
# server.ssl.key-password = another-secret

# Logging
logging.path=/var/log/chronopolis/
logging.file=/var/log/chronopolis/ingest.log
logging.level.org.springframework=ERROR
logging.level.org.hibernate=ERROR
logging.level.org.chronopolis=DEBUG

# SMTP Configuration

# smtp.host=localhost.localdomain
# smtp.to=chron-support@sdsc.edu
# smtp.from=localhost
# smtp.send=false

As of version 2.0, we'll be moving to a yaml based configuration. This looks similar to the above, with a few changes being propagated through the various services to get all the properties to be the same.usr/local/chronopolis/ingest/application.yml configuration file:

Code Block
languagetext
titleapplication.yml
collapsetrue
# Ingest Configuration Properties

# Ingest Cron job properties
# tokens: the rate at which to check for bags which have all tokens and can have a Token Store written
# request: the rate at which to check for bags which need their initial replications created
ingest.cron: 
  tokens: 0 0/10 * * * *
  request: 0 0/10 * * *

# Ingest AJP Settings
# enabled: flag to enable an AJP connector
# port: the port for the connector to listen on
ingest.ajp:
  enabled: false
  port: 8009

# The staging area for writing Token Stores. Nonposix support not yet implemented.
## id: The id of the StagingStorage in the Ingest server
## path: The path to the filesystem on disk
chron.stage.tokens.posix.id: -1
chron.stage.tokens.posix.path: /dev/null

# Database connection
# Initialize should be kept false so that the server does not attempt to run a drop/create on the tables
spring.datasource:
  url: jdbc:postgresql://localhost/ingest
  username: postgres
  password: dbpass
  initialize: false

# Specify the active profile for loading various services, normally production
spring.profiles.active: production
spring.pid.file: /var/run/ingest-server.pid

# debug: true
server.port: 80008080

# Logging properties
logging.file: ingest.log
logging.level:
  org.springframework: ERROR
  org.hibernate: ERROR
  org.chronopolis: DEBUG


Notes on

...

Configuratiom

  • The ingest.replication properties are used to build the uris for replication. An example, replicating a collection "Scientific_Data" from depositor "ucsd-researchers":
  • An AJP connector can now be configured with the server, meaning SSL can be served through apache httpd instead of a java keystore
  • The pid file should probably not be updated unless you update the init files with any corresponding changes

Running

The ingest server runs as an executable jar. Using the init script allows for starting and stopping of the server as root: `service ingest-server start`

...

As of version 1.4.0, passwords for users are now encoded using bcrypt. In the event a user forgets their password, we will need to reset  t reset it for them. As we do not have email notifications or anything of the like setup, for the moment everything will need to be done manually. We will first need to run the password through an encoder, which can be found online. If you aren't sure how many rounds to use, check the database as the information is kept as part of the encoding, i.e. $2a$08 uses 8 rounds; $2a$10 uses 10 rounds.

Then we connect to the database and issue a simple update:

Code Block
languagesql
titleUser update
 UPDATE users SET password = '$2a$10$hEYYHV/Fri00RRHjWPIAWuH3NxYpPPjbMU5OsJfH1SAenajQqKjhK' WHERE username = 'user_resetting_password';

Node Specific Admin

StorageRegions

With the release of version 2.0.0, StorageRegions have been introduced in order to facilitate distribution of content from many nodes in Chronopolis. Configuration for them is as follows:

  • ReplicationConfiguration
  • Notes
  • ...

...

Open Questions

  • How do we handle error’d bags? (hold, reject, ??)
    • We bag the packages ourselves, so we should get no bags with errors
    What about malformed digests?
    • The ingest-server stores a token for each valid digest, and ignores all others. Either manifest is not digested until 100% of the files are valid. Requests are not made until tokens have been created for each file in the bag.
  • Do we have a record of all the collections and their states as they move through to replication? We need to be able to retrieve this data, including any failures.