This document outlines the server specifications used by Whitman College. This is intended to serve as an example; institutions with different use cases or server environments may need to use different specifications.

Staging vs. Production

Whitman College uses both a staging server and a production server. The staging server was used exclusively for testing during the pilot project; it was frequently wiped and reloaded throughout the testing period. At the end of the pilot, following the production migration, the staging server was synchronized with the production server so the environments would be identical for future testing.

Server Specifications

The servers are hosted in the cloud using Amazon AWS.

  • Two EC2 servers 
    • ISLE host VMs - Production & Staging
    • Operating system: Ubuntu 20.04 LTS
    • 2 x ElasticIPs 
      • Domains can be assigned in DNS
  • Server 1 - Staging 
    • m5.xlarge
    • 4 x CPUs 
    • 16 GB (minimum) 
  • Server 2 - Production
    • m5.2xlarge
    • 8 x CPUs
    • 32 GB RAM

Server EBS Volumes

These specifications are the same for both Production and Staging environments on each system. Three disks / volumes for storage on each server for a total of six AWS EBS disks between Production and Staging environments / servers.

  • 1 x 50 GB volume for Operating System
    • Filesystem will be ext4
    • Disk type is SSD / gp2
    • 10 GB swap file


  • 1 x 100 GB EBS volume formatted as (ext4) for the following:
    • Starts with a SSD gp2 and can move onto st1 (at the 500 GB threshold)
    • Allows staff to only have to grow one disk
    • Snapshots are easier to recover from instead of multiple to have to coordinate.
    • To be mounted at /mnt/data, containing the following:
      • Blazegraph data
        • bigdata directory
      • Cantaloupe
        • cache & configuration files
      • Drupal data 
        • public & Private files directories
      • Fedora data
        • objectStore directory
        • datastreamStore directory
      • Mariadb data
        • MySQL databases & configuration files
      • Matomo data
        • matomo-data
      • Solr data 
        • collection1 directory
    • All data directories are bind-mounted to their respective Docker container as identified in the docker-compose.yml file within the project git repository.


  • Additional disks to be determined for Fedora datastreamStore directories or as S3 buckets
    • Staging build - 1 x 4 TB EBS volume (st1) formatted as (ext4)
    • Production build - 1 x 4 TB EBS volume (st1) formatted as (ext4)

S3 Buckets

  • arminda-i8-db-backup
    • S3 bucket for Drupal & MySQL backups (daily / weekly)
    • This bucket would use Lifecycle management and expire objects older than 32 days to keep backups up to date and to lower cost. This expiration policy can be shortened or extended as Whitman staff likes.
  • arminda-i8-drupal-public-files
    • Drupal public files directory
  • arminda-i8-drupal-private-files
    • Drupal public files directory
  • arminda-i8-fedora-data
    • Private S3 bucket for storing Islandora 8 / Fedora 6 datastreamStore data
    • This will be used instead of an EBS volume

Domains and SSL

Software

Each VM will have the following software installed:

  • git
  • ntp
  • htop
  • Docker
  • Docker-compose
  • AWS command line tools
  • Gitlab-runner - an application that works with GitLab CI/CD to run jobs in a pipeline
  • [Optional]
    • Firewall or fail2ban as needed (depends on institution's security requirements)
    • Monitoring or agent software (depends on institution's security requirements)
    • Born-Digital can setup a TICK alert and monitoring system (as needed)


Software Version Control

  • A minimum of two git repositories will be created and used by Born-Digital to store the ISLE 8 configuration and Drupal site (Drupal modules, Islandora and theme)
    • These repositories store all source code at https://gitlab.com/born-digital-us 
    • BD to use git repos to deploy from for the Whitman Drupal site and ISLE 8 config to ISLE host servers.
    • Additional git repositories can be made if needed e.g. Terraform or AWS Cloudformation scripts


Server Ports and Service Access

Using the AWS VPC 

  • A user called islandora will be created on both machines for admin users to connect from
  • Only ports 80 & 443 should be open to the public internet on the Production system only.
  • The Staging server should not be open to the public internet at all only to the main Whitman campus, its staff and the BD office(s)
  • However ports 80 and 443 should still be open to the campuses and the BD office.
  • The following ports should be open only to select campus admins, the BD office but not the public internet:
    • 3306 - Mariadb / MySQL port
    • 8080 - Admin panel for Traefik proxy service
    • 8081 - Admin panel for the Fedora / Tomcat services
    • 8082 - Admin panel for the Blazegraph / Tomcat services
    • 8161 - Admin panel for ActiveMQ service
    • 8983 - Admin panel for Solr service
  • The new staging and production servers should be able to SSH/SCP from the existing prod server (port 22)

Backups

  • Using AWS backups and Data Lifecycle Manager, establish backup scheme involving scheduled snapshots to backup the following:
    • /mnt/data
  • Additionally define restoration and sync process with Whitman staff and IT using snapshots and rsync