You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The Ingest Server holds information about all packages in the Chronopolis Network. As such, it will need to at least have some updates in order to accommodate a new file layout, as well as additional updates which could be added for quality of life when requesting information about packages. 


todo

  • packagecollection
  • formatting
  • bag → collection api updates
  • rest model review
  • s/package_layout/file_layout

Database Migrations

Bag Table

The Ingest Server currently assumes that every package coming in is using a BagIt format. Regardless of what packaging format we choose, we should limit assumptions made by the Ingest Server and use a more generic name for data being preserved. We should align nomenclature of the Ingest Server with ACE-AM and refer to our grouped data as Collections. Additional information will sit alongside to describe the packaging format used, which would allow for migration in the future if a new standard comes along which is better than what we apply. In addition, this information could be used in order for our Intake and Restore services to return data back to a depositor.

Overview of Changes

  • Create and populate a package_layout table pre-populated with known layouts
    • BagIt
    • OCFL
  • Add an additional package_layout_id column on the bag table for the package layout
    • Migrate current collections to have the BagIt id
    • Set package_layout_id column to the id for BagIt
  • Use unified naming for collections
    • Alter the name of bag to collection
    • Alter the sequence bag_id_seq to the collection_id_seq
    • Alter the table name for bag_distribution to collection_distribution
    • Alter the sequence bag_distribution_id_seq to collection_distribution_id_seq
  • JPA Entity Updates (rest-entities)
    • Bag
      • Add package_layout
      • Rename to Collection
    • BagDistribution
      • Rename to CollectionDistribution
    • BagDsitributionStatus
      • Rename to DistributionStatus
  • JPA Entity Serializer Updates (rest-entities)
    • BagSerializer rename to CollectionSerializer

Versioning Support

There will be a need to have some level of versioning in the database in order to better support a package layout which has versioning as well. We will need information in a collection about what version it is on as well as the files required for the collection

Package Level

The primary table which is used to track collections will need to be updated to contain a reference to the latest version. The collection_version table does not need to be large and should reflect the same information as in the OCFL inventory file: id, created, and number. There are additional fields in the OCFL inventory for storing a message and the user who created the version, which could also be added to the table. In general these tables can be thought of as a cache of what is in Chronopolis, and should be able to be rebuilt by iterating through what is on disk if necessary.

New Table: collection_version


Column

Type

Comment

+

id

bigint

Primary key for the table

+

collection_id

bigint

Id for joining version information to a collection

+

number

bigint

Natural number which can be incremented for determining the numbered version of a collection

+

created_at

datetime

Date at which the collection version was created

File Level

The Ingest Server currently tracks Files and associated metadata in the form of Fixity and ACE Tokens. If files are to be updated, then they also need some type of version information. It is likely that both the Fixity and ACE Token tables will need some information as to what version of a File they are pointing to, and likewise a File will need to know what the latest version is. As we do not update Files separately from the collections which they belong to, piggy backing off of the collection_version will likely be the best choice to make here. In order to do this, we can add additional columns to the fixity and ace_token tables for the collection_version_id

Updated Table: fixity table updates (might remove uniq idx)


Column

Type

Comment

+

collection_version_id

bigint

Indicate the collection version which this fixity (and by proxy file) belongs to

+

(file_id, collection_version_id)

Unique index

Index and indicate uniqueness of files to the collection_id


Updated Table: ace_token (might remove uniq idx)


Column

Type

Comment

+

collection_version_id

bigint

Indicate the collection_version which this ace_token belongs to

+

(file_id, collection_version_id)

Unique index

Index and indicate uniqueness of files to the collection_id


Normalization Opportunities

The fixity table contains columns which could be broken out into separate tables: algorithm and value. Both of these columns could be referenced by an id and joined on a separate table. The algorithm column specifically should belong to a supported_algorithms table from which we store the message digest algorithms which we support. For the value column, a fixity_value table could be created which would reduce duplication of fixity values stored. This is less of a concern than the algorithms, but still something to consider.

Changes
  • collection version database migration
    • Create collection_version table with required columns
    • Add collection_version for each collection which defaults to version 1 and uses the collection’s created_at for its created_at
  • collection version database entities
    • Create collectionVersion class under rest-entities to create a JPA Entity
    • Update Collection class and add the relationship with the CollectionVersion
  • Collection version models
    • Create CollectionVersion class under rest-models
    • Expand Bag class to include current version information
  • File level versioning database migration
    • Update fixity table to add a package_version_id column
    • Update ace_token table to add a package_version_id column
    • Set default package_version_ids to be the first package_version for the package they belong to
    • Create unique_index for fixity and ace_token tables on (file_id, package_version_id)
  • File level entity updates
    • Update Fixity class under rest-entities to include its relationship to the CollectionVersion
    • Update AceToken class under rest-entities to include its relationship to the CollectionVersion
  • File level query updates
    • Under ingest-rest, ensure queries use the current version of a package when looking for files

Optional: Database - Content Addressing

Store the address of a file in the database. Possibly update file table to add column for content_address and rename filename logical_address.

API Updates

In order to facilitate versioning of content, the REST API on the Ingest Server will need to be expanded. This will involve creating an endpoint to create Versions on Packages, querying versions for Packages, and possibly retrieving File data for Versions.

Models

Before digging into the endpoints, it is important to look at the models which we will need to add to support Versioning.

PackageVersion

The same information we receive from the database, the id, version_number, created_at, and package_id (optionally package_name).

VersionManifest

A plain text listing of Files and their Fixity values. Originally was going to be just the Files, but since metadata files can change it is important to include the Fixity as well.

Example

db84dd4fb5dfc0cef5f0509e9e910ee6f416c2df  data/manifest.txt
d96f61bfe6f832dad3a73c09bb177967282c2400  data/content-properties.json
20c59b1614adc782c889301ba6f2bb46e998fab7  data/.collection-snapshot.properties

 

API Model Updates

  • Bag
    • Add package_layout
    • Rename to Collection
  • BagCreate rename to CollectionCreate
  • BagStatus rename to CollectionStatus
  • API Endpoint Updates
    • BagController (ingest-rest)
      • Rename to CollectionController
      • Update route to use collection
    • BagService (rest-models)
      • Update routes to use collection


PUT Collection Version

Description

In order to create a version, a Manifest can be uploaded to an endpoint for a given Package resource, which would save a new set of Files and Fixities for a given version. As mentioned previously this is essentially a cache of what is on disk, and should be able to be rebuilt from the data in the Package. Since this information is coming from an update on disk, we know where the Version will be able to be located prior to creating the resource in the Ingest Server. 

This endpoint could replace the Create Files endpoint, which serves to populate a Package with a set of files from a given manifest.

Request/Response

  • Request: PUT /api/collections/<collection_id>/version/<version_id>
  • Request Parameters:
    • collection_id -
    • version_id -
  • Request Body: VersionManifest
  • Response Body: Undetermined
  • Response Codes:
    • 201 - Created
    • 400 - Bad  Request
    • 401 - Unauthorized
    • 403 - Forbidden
    • 409 - Version Exists

GET Current Version

Description

Get the current version for a given Package

Request/Response

  • Request: GET /api/collections/<collection_id>/version
  • Response Body: PackageVersion
  • Response Codes:
    • 200 - Ok
    • 401 - Unauthorized
    • 404 - Not Found

GET Current Manifest

Description

Description: Get the manifest for the current version for a given Package

Request/Response

  • Request:GET /api/collections/<collection_id>/manifest
  • Response Body: VersionManifest
  • Response Codes:
    • 200 - Ok
    • 401 - Unauthorized
    • 404 - Not Found

GET Version

Description

Description: Get the current version for a given Package

Request/Response

  • Request:GET /api/collections/<collection_id>/version/<version_id>
  • Response Body:PackageVersion
  • Response Codes:
    • 200 - Ok
    • 401 - Unauthorized
    • 404 - Not Found

GET Version Manifest

Description

Description: Get the manifest for a given Package Version

Request/Response

  • Request:GET /api/collections/<collection_id>/version/<version_id>/manifest
  • Response Body: VersionManifest
  • Response Codes:
    • 200 - Ok
    • 401 - Unauthorized
    • 404 - Not Found

Deprecated Endpoints

Some endpoints can be superseded by the versioning endpoints. These include:

GET /api/bags/<bag_id>/download

POST /api/bags/<bag_id>/files

Endpoint

Description

Request/Response



Informational Endpoints

There will also need to be various endpoints in order to query for Versions, the Files and Fixity for a version, and the ACE Tokens for a version. These are mostly informational, and can include some shortcuts where querying on the Package without a version_id will return content for the latest version. In addition, it might be good to introduce the concept of a VersionManifest, which is something which we can use to download a simple plain text view of a Version. This would include the Filenames and possibly Fixity for versions, but not much else.


  • No labels