From Last Meeting
- Lynette - Followup in slack about some of the remaining questions.
- Vitus - provide examples of types that would be used with entities and types that would be used with authorities
- Review updated activity streams document
- Appendices for 3 primary use cases (i.e. full cache, partial cache, notifications)
- Object types
- Activity Streams - Extensions for Authoritative Data Change Management (min extensions + instruments)
Notes were taken directly in the Activity Streams document as comments with general discussions captured here.
Alternate approach for full cache:
- take full dump
- one patch file for all changes since last dump
- new changes are appended to the end of the patch file as they occur
- apply a patch file to the last full dump
Might work well for smaller authorities. Or if it is small, they may just do a full replace instead of applying patches. Complexity of following a series of patch documents may be more overhead than processing a full download.
Could be useful for the first time grab of data. Take the last full dump. Apply the current full patch. From then on process the Activity Stream to keep it up to date.
What is the value of the detailed information in the Activity Stream for a full cache?
- for a large cache, the ingest is very time consuming
- the detailed information (e.g. object type, etc.) may not be used by the full cache model
Alternative: The provider periodically produces a large patch file that can be applied. The consumer could keep applying each in order.
- would these happen often enough?
- Activity Streams can update really frequently and possibly at transaction time. Common to release activity streams daily. LOC is looking at daily releases. Lynette will check with Getty to see how often they update.
Example from Notification Stream for LOC
- Add + uri
- Update + uri
- Remove + uri
- Not developed yet
- Looking at making this ordered to be able to track specific changes
- May include the text of the new label value (TBD)
The notification approach might will work well for the Partial Cache use case as well, but might depend on what is being cached. The consumer knows which URIs are cached and what data about that URI is cached. They can dereference the URI and grab the new data to update the local cache.
If grabbing the entire set of data associated with a URI, will need to have a way to determine the edges of the graph for removing the outdated triples and adding the triples of the new full graph for the URI.
Description should include a description of the difference between full cache and notifications. Include that both are activity streams, but the second includes fewer details.
Consumers will need to be familiar with the shape of the data they are consuming. If they want the full graph of a URI, they will need to know this shape.
Producer may only push up the Activity Stream once a day. The only thing that is known is the dt-stamp of the last update.
This may mean that if something got update multiple times during the day, it may not matter. The consumer will need to know the last time they crawled the data and will need to start again at that date.
Consumers may just make a list of URIs that change and then bring in the changes through dereferencing and grabbing the data they need.
Can feeds come too frequently?
Beyond knowing the frequency, the consumer will need to be familiar with the service when it comes to types/modeling.
- Notification to a single institution when a requested entity that was missing becomes available. This will likely be outside the main change management stream.
- Value of diagraming to express changes as a means of conveying information around where data is being produced, where it goes, etc.
- Partial splits and partial merges.