From Last Meeting
- Lynette - Update activity streams document to more closely follow the standard.
- Activity Streams - Extensions for Authoritative Data Change Management (minimal extensions)
- Mock SPARQL Update Queries for Common Changes (working document)
- Existing Change Management Approaches (working document)
- Types of Changes (working document)
Recording: Activity Streams - Activity Types: Moving closer to the standard (2021-08-16)
Notes were taken directly in the Activity Streams document as comments with general discussions captured here.
From audience perspective…
- triplestore maintainers would like to have rdf_patch information to be able to quickly keep triplestore in sync
Is there an audience that would prefer a notification only?
For LC, Stream has just URI and that that entity was updated.
- Provider may not even know what changed.
- Is there a way to provide more information about the specific change?
How do the end users know what was changed?
Are change management documents…
- notification system providing awareness that changes happened
- a document describing changes
With notification, the consumer can decide what to do with it.
Is it acceptable to expand the activity stream as notification to be activity stream with change documents?
If it’s there and you don’t want it, just ignore it.
If it’s not there and you need it, there are extra steps that would be required. What would those extra changes be?
If we include rdf_patch, what happens if the changes are very large? (e.g. https://id.loc.gov/authorities/names/n2008054754.madsrdf.nt)
What if we included a parameter on the activity stream request to have verbose=true|false, where rdf_patch is only included with verbose=true.
Doesn’t seem like it would be a big deal to have to do two steps to get each change… 1) read the notification in the activity stream, 2) follow relationship URL to get the rdf_patch
Complete History vs. Recent Notifications
Complete history =
- baseline download + all changes since baseline as they happen
- multiple changes to an entity means the entity appears in the activity stream multiple times
- changes are dated to indicate when the change occurred
- includes patch if available
Recent Notifications =
- maybe baseline download
- may include all entities that have changed since the baseline was created
- if an entity has changed more than once, it is collapsed into a single notification with the latest change date
- no patch data
Need to provide the authoritative patch sequence. If I modified on July 1, and then take a patch on Aug 1, how can I refer to the changes if the changes in the later patch if they were adjusted in a way in the July patch.
Following the process of
- go to the newest patch (most current date),
- step back to the most recent patch since your last processing,
- then step forward and process patches in order.
What happens if a site gets out of order? Is there a way to replay from a known state?
Periodic whole sale replacement. Having this regularly allows for confidence in the data integrity. A long running patch might cause problems over time.
Medline does a baseline replacement once a year. After that, you have to apply patches in order.
Providing RDF patch could be optional. For authorities that can provide a patch, then we provide a spec for what that would look like. For authorities that do not track low level information, this would be no worse than what they currently have.
- Notification to a single institution when a requested entity that was missing becomes available. This will likely be outside the main change management stream.
- Value of diagraming to express changes as a means of conveying information around where data is being produced, where it goes, etc.
- Partial splits and partial merges.