Current WorkTom

OTM Bridge Spec

Review of Bridge API spec: OTM Bridge API Specification

  •  Assumptions
    • Bridge would have a user registered by the DDP in order for the gateway to communicate, can discuss authentication later on
    • file ids are unique per account – assuming that the gateway/repository has some way of maintaining its own uniqueness. file ids can collide between accounts.
    • some type of authentication on endpoints - can consider basic auth to start, potentially using tokens instead
      • some endpoints might not need auth depending on where they are (e.g. getting files from the gateway)
  • Register
    • what type of authentication to use - should we do something akin to aws with signed tokens?
    • basic auth to start?
    • otm-url to gateway or repository?
      • gateway
  • Version
    • specification API or application API?
      • s...pecification?
  • Deposit
    • is the deposit-id necessary?
      • no, the bridge could return it in a response. moves responsibility of uniqueness to the bridge + simplifies request body
    • have been avoiding the question of what constitutes a file, but must ask: what is a file?
      • a bytestream
    • are files being grouped in any way?
      • grouping implies we need to store extra information associated with the files and be able to track that
      • current spec is simple - just push file identifiers
      • generally have been trying to avoid describing any grouping abstraction (collection, work, bag, etc), might be best to continue to avoid it
      • possibly add idea of groupings later
    • checksum-type is defined at the top level - what about per file?
      • will expect that all deposits use the same checksum, but can see scenarios where the other might exist
      • I don't remember where we landed, probably keep this for now and add on later
    • add endpoint for supported checkpoints?
      • maybe extend the version api with more information
  • List Deposits
    • on lists deposits in progress, intended to provoke thoughts
    • deposits are seen as an ephemeral resource, not something we will be keeping forever
    • failed deposits would persist until aborted or restarted
    • this is something we should continue to dig into and think about
  • Deposit Status
  • Abort Deposit
    • only valid in certain states
    • need to discuss guarantees from the bridge about what actions it will take depending on where a deposit is
  • Restart Deposit
    • similar to abort, just needs to be livened up a bit
  • Delete Content
    • are we doing a delete or a purge?
      • what is the difference?
      • we can update the endpoint to delete for now, reserve purge for a future version when people have strong opinions about how they want to remove content
    • what are the error responses/conditions?
      • checksums not matching, files not existing
      • how much work do we expect the bridge to do on operations
        • transferring 10 1tb files could mean certain operations are expensive
        • even procuring storage
        • needs more discussion, maybe insight from user stories
      • when a single file fails, does that fail the entire deletion?
        • I don't remember
    • what is the url referring to?
      • do we need the url? no
      • do we need the url in the deposit? no - just need to ensure that the gateway has a well defined way of hosting by file identifiers
  • Delete Status
    • needs discussion on what the response body is
    • give per file information?
  • Get Audit History
    • file-id not guaranteed to be unique across accounts, maybe we shouldn't use it
      • could update audit history endpoint to be POST with a request body indicating what files, events you want to receive
      • allows for good extensibility later on
      • aligns well with to other endpoints
    • should this return a response body or an audit-id which the audit events can be received from?
      • depending on how much is being asked the operation could take some time. could easily trigger timeouts, etc.
      • if we push to an audit-id, we need to have an idea of how long that resource would last
        • other events are ephemeral, is this too?
        • need additional endpoint if we do this
  • Restore Content
    • do we need the checksum?
      • maybe optional
      • gateway might not have any idea of what the checksum is
    • need additional endpoint to say where the file is located
      • restore/{restore-id]/{file-id}
        • if restore-id is unique, should not need to worry about collisions
  • Restore Status
    • what even is status

Next steps

Tom - work on Repository Gateway Sepc

Bill - updates to OTM Bridge API Specification from discussion

Mike - work on DDP/Bridge interaction + put on wiki

All - deep thinking about life, liberty, and the pursuit of storing data forever

