Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Glacier doesn't really interact with your data as files per se: the thing that constitutes a glacier archive is a set of 1MiB blocks. You can request these in ranges, so that if you had a way to track offsets you could compose an archive out of a number of smaller files.  In that case, it will be most economical to fetch them in contiguous blocks to minimize retrieval calls.

Retrieving data works in two steps:

1. You POST a request to AWS, creating a retrieval job and receiving in response a Job ID

2. You GET the content with the job ID when it's ready

You can find out when the content is ready by:

1. Including a reference to a notification service in the POST

2. Sending a GET to the job description service with the job ID 

These both provide the same information, but #2 is annoying and eats into your API call allotment.  If you were building a product like, say, Fedora around Glacier, you'd need to both implement the notification service and keep a log of outstanding jobs so that you could attempt to recover unfulfilled requests from downtime.  Job IDs apparently remain valid for at least 24 hours after completion, but this implies that they may be recycled, so your recovery log needs the original request time to make a reasonable guess about whether a successful description via GET (#2 above) is actually relevant anymore.