Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languageyml
titleapplication.yml
collapsetrue
# Ingest Configuration Properties

# Ingest Cron job properties
# tokens: the rate at which to check for bags which have all tokens and can have a Token Store written
# request: the rate at which to check for bags which need their initial replications created
# tokenize: the rate at which to check for local bags which need tokens created - DEPRECATED in 3.0
ingest.cron:
  tokens: 0 0/10 * * * *
  request: 0 0/10 * * * *
  tokenize: 0 0 * * * * *

# Ingest AJP Settings
# enabled: flag to enable an AJP connector
# port: the port for the connector to listen on
ingest.ajp:
  enabled: false
  port: 8009

# The staging area for writing Token Stores. Nonposix support not yet implemented.
## id: The id of the StagingStorage in the Ingest server
## path: The path to the filesystem on disk
chron.stage.tokens.posix.id: -1
chron.stage.tokens.posix.path: /dev/null

# If Local Tokenization is desired include properties for the Ingest API user, staging information for Bags, and ACE IMS connection information
# username: The name of the user who created the bags ingest will be tokenizing
ingest.api.username: bag-creator

## id: The id of the StagingStorage which the Ingest Server will read from  - DEPRECATED in 3.0
## path: The path to the filesystem on disk  - DEPRECATED in 3.0
chron.stage.bags.posix.id: -1 
chron.stage.bags.posix.path: /dev/null

## port: the port to connect to the ims with
## waitTime: the time to wait between token requests
## endpoint: the fqdn of the ims
## queueLength: the maximum number of requests to send at once
ace.ims:
  port: 80
  waitTime: 5000
  endpoint: ims.umiacs.umd.edu
  queueLength: 1000

# Database connection
# Initialize should be kept false so that the server does not attempt to run a drop/create on the tables
spring.datasource:
  url: jdbc:postgresql://localhost/ingest
  username: postgres
  password: dbpass
  initialize: false

# Specify the active profile for configuring services as a comma separated list
# production - remove stdout/stderr from printing and run without accepting input
# disable-tokenizer - disable local tokenization services from running
spring.profiles.active: production
spring.pid.file: /var/run/ingest-server.pid

# debug: true
server.port: 8080

# Logging properties
logging.file: ingest.log
logging.level:
  org.springframework: ERROR
  org.hibernate: ERROR
  org.chronopolis: DEBUG

...

Code Block
languagebash
titleExample Bag Ingest
collapsetrue
#!/bin/sh
#
# This is pseudocode which provides an example for how ingesting a bag might look
# when scripted. It will likely be revised before the production release of 3.0.0
# to actually work.
#################################################################################

BAGS="test-bag-0"
MANIFEST="manifest-sha256.txt"
TAGMANIFEST="tagmanifest-sha256.txt"

BAG_JSON='{"name":"scripted-bag-0", "depositor": "script-depositor", "size": 1024, "totalFiles": 10}'
STAGING_JSON='{"location": "script-depositor/scripted-bag-0", "validationFile": "/tagmanifest-sha256.txt", "storageRegion": 1, "totalFiles": 10, "size": 1024, "storageUnit": "B"}'
INGEST_USER="ingest"
INGEST_PASSWORD="secret"
INGEST_BAG_CREATE="http://localhost:8080/api/bags"
INGEST_FILE_CREATE="http://localhost:8080/api/bags/{id}/files"
INGEST_STAGING_CREATE="http://localhost:8080/api/bags{id}/staging/BAG"

generate_csv() {
    for bag in $BAGS; do
        current_manifest="$bag/$MANIFEST"
        current_tagmanifest="$bag/$TAGMANIFEST"

        awk -v bag="$bag" 'BEGIN { printf "FILENAME,SIZE,FIXITY_VALUE,FIXITY_ALGORITHM\n" } 
                  { printf "\"" $2 "\","
                    system("find " bag "/" $2 " -printf '%s'")
                    printf ","
                    printf $1 ",SHA-256\n" }' $current_tagmanifest $current_manifest > "$bag".csv
        tagsum=$(sha256sum $current_tagmanifest | cut -c -64)
        tagsize=$(find $current_tagmanifest -printf '%s') 
        echo "\"$TAGMANIFEST\",$tagsize,$tagsum,SHA-256" >> "$bag".csv
        gzip -c "$bag".csv > "$bag".csv.gz

        echo -n "$bag csv: "
        find "${bag}.csv" -printf '%s\n'
    done
}

# generate a csv file
generate_csv();

# register the bag, files, and staging
curl --user ${INGEST_USER}:${INGEST_PASSWORD} --header "Content-Type: application/json" --data '${BAG_JSON}' ${INGEST_BAG_CREATE}
# the bag id needs to be retrieved and injected into the next 2 calls
curl --user ${INGEST_USER}:${INGEST_PASSWORD} -F "file=@${bag}.csv;type=text/csv" ${INGEST_FILE_CREATE}
curl --user ${INGEST_USER}:${INGEST_PASSWORD} --header "Content-Type: application/json" --data '${STAGING_JSON}' ${INGEST_STAGING_CREATE}

Local Tokenization

When doing local ingestion of bags through the Ingest Server, it's possible to also have the Ingest Server create ACE Tokens for the files registered to a collection. This can be enabled through the application.yml configuration file:

Code Block
languageyml
titleTokenization Configuration
# Ingest Tokenizer Settings
# cron: the cron timer for running local-tokenization
# enabled: flag to enable Local tokenization of bags
# username: the 'creator' to check for when depositing bags (defaults to to 'admin')
# staging.id: the ID of the StorageRegion to write tokens into
# staging.path: the path to the filesystem on disk
ingest.tokenizer:
  cron: 0 0 * * * *
  enabled: true
  username: admin
  staging.id: -1
  staging.path: /dev/null

Note: If you want to disable local tokenization, you must set the ingest.tokenizer.enabled to false; otherwise the Ingest Server will attempt to create Beans for tokenization and fail to start depending on the configuration

Resetting Passwords

As of version 1.4.0, passwords for users are now encoded using bcrypt. In the event a user forgets their password, we will need to reset it for them. As we do not have email notifications or anything of the like setup, for the moment everything will need to be done manually. We will first need to run the password through an encoder, which can be found online. If you aren't sure how many rounds to use, check the database as the information is kept as part of the encoding, i.e. $2a$08 uses 8 rounds; $2a$10 uses 10 rounds.

Then we connect to the database and issue a simple update:

...