The Ingest Server holds information about al collections in the Chronopolis Network. As such, it will need to at least have some updates in order to accommodate a new file layout, as well as additional updates which could be added for quality of life when requesting information about collections.
The Ingest Server currently assumes that every collection coming in is using a BagIt format. Regardless of what packaging format we choose, we should limit assumptions made by the Ingest Server and use a more generic name for data being preserved. We should align nomenclature of the Ingest Server with ACE-AM and refer to our grouped data as Collections. Additional information will sit alongside to describe the packaging format used, which would allow for migration in the future if a new standard comes along which is better than what we apply. In addition, this information could be used in order for our Intake and Restore services to return data back to a depositor.
collection_layout
table pre-populated with known layoutscollection_layout_id
column on the bag table for the collection layoutcollection_layout_id
column to the id for BagItbag
to collection
bag_id_seq
to the collection_id_seq
bag_distribution
to collection_distribution
bag_distribution_id_seq
to collection_distribution_id_seq
rest-entities
)Bag
CollectionLayout
Collection
BagDistribution
CollectionDistribution
BagDsitributionStatus
DistributionStatus
rest-entities
)BagSerializer
rename to CollectionSerializer
There will be a need to have some level of versioning in the database in order to better support a collection layout which has versioning as well. We will need information in a collection about what version it is on as well as the files required for the collection
The primary table which is used to track collections will need to be updated to contain a reference to the latest version. The collection_version
table does not need to be large and should reflect the same information as in the OCFL inventory file: id, created, and number. There are additional fields in the OCFL inventory for storing a message and the user who created the version, which could also be added to the table. In general these tables can be thought of as a cache of what is in Chronopolis, and should be able to be rebuilt by iterating through what is on disk if necessary.
New Table: collection_version
Column | Type | Comment | |
+ | id | bigint | Primary key for the table |
+ | collection_id | bigint | Id for joining version information to a collection |
+ | number | bigint | Natural number which can be incremented for determining the numbered version of a collection |
+ | created_at | datetime | Date at which the collection version was created |
The Ingest Server currently tracks Files and associated metadata in the form of Fixity and ACE Tokens. If files are to be updated, then they also need some type of version information. It is likely that both the Fixity and ACE Token tables will need some information as to what version of a File they are pointing to, and likewise a File will need to know what the latest version is. As we do not update Files separately from the Collections which they belong to, piggy backing off of the collection_version
will likely be the best choice to make here. In order to do this, we can add additional columns to the fixity and ace_token
tables for the collection_version_id
.
Updated Table: fixity table updates (might remove uniq idx)
Column | Type | Comment | |
+ | collection_version_id | bigint | Indicate the collection version which this fixity (and by proxy file) belongs to |
+ | (file_id, collection_version_id) | Unique index | Index and indicate uniqueness of files to the collection_id |
Updated Table: ace_token (might remove uniq idx)
Column | Type | Comment | |
+ | collection_version_id | bigint | Indicate the collection_version which this ace_token belongs to |
+ | (file_id, collection_version_id) | Unique index | Index and indicate uniqueness of files to the collection_id |
The fixity table contains columns which could be broken out into separate tables: algorithm and value. Both of these columns could be referenced by an id and joined on a separate table. The algorithm column specifically should belong to a supported_algorithms
table from which we store the message digest algorithms which we support. For the value column, a fixity_value
table could be created which would reduce duplication of fixity values stored. This is less of a concern than the algorithms, but still something to consider.
collection_version
table with required columnscollection_version
for each collection which defaults to version 1 and uses the Collection’s created_at
for its created_at
C
ollectionVersion
class under rest-entities to create a JPA EntityCollection
class and add the relationship with the CollectionVersion
Collection
Version
class under rest-modelsCollection
class to include current version informationfixity
table to add a collection_version_id
columnace_token
table to add a collection_version_id
columncollection_version_id
to be the collection_version
for the Collection
they belong tounique_index
for fixity
and ace_token
tables on (file_id, collection_version_id)
Fixity
class under rest-entities to include its relationship to the Collection
Version
AceToken
class under rest-entities to include its relationship to the Collection
Version
In order to facilitate versioning of content, the REST API on the Ingest Server will need to be expanded. This will involve creating an endpoint to create Versions on collections, querying versions for collections, and possibly retrieving File data for Versions.
Before digging into the endpoints, it is important to look at the models which we will need to add to support Versioning.
CollectionVersion
The same information we receive from the database:
id
version_number
created_at
collection_id
VersionManifest
A plain text listing of Files and their Fixity values. Originally was going to be just the Files, but since metadata files can change it is important to include the Fixity as well.
Example
db84dd4fb5dfc0cef5f0509e9e910ee6f416c2df data/manifest.txt |
layout
Collection
ingest-rest
)rest-models
)In order to create a version, a VersionManifest
can be uploaded to an endpoint for a given collection resource, which would save a new set of Files
and Fixities
for a given version. As mentioned previously this is essentially a cache of what is on disk, and should be able to be rebuilt from the data in the collection. Since this information is coming from an update on disk, we know where the Version will be able to be located prior to creating the resource in the Ingest Server.
This endpoint could replace the Create Files endpoint, which serves to populate a collection with a set of files from a given manifest.
PUT /api/collections/<collection_id>/version/<version_id>
collection_id
-version_id -
VersionManifest
201 - Created
400 - Bad Request
401 - Unauthorized
403 - Forbidden
409 - Version Exists
Get the current version for a given collection
GET /api/collections/<collection_id>/version
CollectionVersion
200 - Ok
401 - Unauthorized
404 - Not Found
Get the manifest for the current version for a given collection
GET /api/collections/<collection_id>/manifest
VersionManifest
200 - Ok
401 - Unauthorized
404 - Not Found
Get the current version for a given collection
GET /api/collections/<collection_id>/version/<version_id>
CollectionVersion
200 - Ok
401 - Unauthorized
404 - Not Found
Get the manifest for a given collection Version
GET /api/collections/<collection_id>/version/<version_id>/manifest
VersionManifest
200 - Ok
401 - Unauthorized
404 - Not Found
Some endpoints can be superseded by the versioning endpoints. These include:
GET /api/bags/<bag_id>/download
POST /api/bags/<bag_id>/files