You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »


Reference we can use as a starting point:  Change Management NOTES from LODLAM 2020

Overview

This document describes the types of changes that can occur in authoritative data.  For each type, the following information will be explored:

  • provide a basic description of the change
  • include a discussion of the nuances and challenges for providers and consumers
  • suggest potential approaches for representing the change
  • give example data stream as it would appear in the change document

NOTE: At least initially, the example data will be shown in json, json-ld, or possibly some other format.  Final recommendations for format will be in the Deliverable documents.

Types of Changes


New

Description:

This type provides information on a completely new entity.

Discussion:

If the data for the entity is included in the data stream, how would the edges of the graph be defined.   For some authorities, the entity may be defined as just the data where the new entity URI is the subject (e.g. ontology is limited to skos).  For others, data for an entity can includes data several layers away from the subject URI of the new entity (e.g. works described in BibFrame).   Some may include blank nodes.

The wider the graph, the more chance there is that data expressed in the NEW data would overlap with existing data in the cache.

Examples:


Approach:

Two possible approaches:

  • OPTION 1: send change type (i.e. NEW) and URI of the new entity ; downstream consumers use the URI to fetch the new entity's data

  • OPTION 2: send all data related to the entity along with the information in OPTION 1.  There is a question about where the edges of the graph will be for the entity data.

Example Data Stream:

For Option 1:

{ 
  type: NEW,
  URI: https://uri.of.new.entity
}


For Option 2:

{ 
  type: NEW,
  URI: https://uri.of.new.entity
  entity: { full entity as json-ld }
}



Deleted

Description:

This type provides information on an entity that was completely.  See also Deprecated.

Discussion:

For a deleted entity, the URI will no longer resolve.


Approach:


Example Data Stream:



Deprecated

Description:

This type provides information on an entity that still exists in the authority, but is marked as deprecated meaning it should no longer be used.   For deprecated entities, the URI will continue to resolve.  See also, Split and Merge.

Discussion:

Deprecation typically happens when:

  • an entity is no longer needed
  • an entity is being replaced by another entity with its own URI
  • an entity was split with each of the new entities having their own new URIs
  • two or more entities were merged and the new entity with the merged data has its own new URI

In all cases,  the entity remains in the authority allowing its URI to still resolve for preservation, backward compatibility, and to provide downstream consumers with time to update their references to the entity.

It is common practice for the deprecated entity to include information on what should be used instead.

Examples:


Approach:


Example Data Stream:



Split

Description:

This type provides information on an entity that was split into two or more separate entities.

Discussion:

This commonly results in a new entities for each entity of the split.  The original entity becomes deprecated or deleted.  In some cases, the original entity for the split continues to exist with a different set of data.

This may be a sub-class of deleted or deprecated since the original entity is typically no longer valid under the original URI.

Examples:


Approach:


Example Data Stream:



Merge

Description:

This type provides information on two or more entities that were merged into a single entity.

Discussion:

This commonly results in a new entity with the data coming from each of the merged entities.  The original entities become deprecated or deleted.  In some cases, the merged entities are merged into one of the existing entities of the merge.

This may be a sub-class of deleted or deprecated since the original entities are typically no longer valid under the original URIs.

Examples:

  • In MeSH, a common name is merged into the scientific name. 
  • Two works can merge into one. 
  • WikiData can get redundant entities for the same person that are then merged into one entity.

Approach:


Example Data Stream:



Changed

Description:

This type provides information on an existing entity with changed data.

Discussion:


Examples:


Approach:


Example Data Stream:



Label Change Only

Description:

This type provides information on an existing entity with changed label data.

Discussion:

This specifically meets the need of applications that cache labels.  Question whether there should be caching of labels in downstream consumers?  Several indicate that this is common practice. 

Examples of use cases for caching labels in applications:

  • Sinopia caches labels to avoid having to dereference URIs when viewing the data. 
  • Cache primary and variant labels when creating indices for searching.

Discussion on Primary Label:

  • Primary label may not be a human understandable label.  For example, ISNI number is primary label in the ISNI authority.
  • Although it is common for primary label to be a single value, some authorities may allow multiple values for the primary label.  For example, Wikidata has multiple primary labels for different languages and some may be missing for some languages.
  • In some authorities, the primary label is critical and in others it may be optional or used for convenience.

  • The predicate used to identify primary and variant labels can, and commonly do, differ between authorities.
  • If DELETE_LABEL is supported, is there a concept of minimally viable data required for a valid entity and would this include a primary label?  Thus, when the primary label is removed, does it require a replacement label?  What happens for primary label if there is a delete but no add?  Probably not a problem because primary label is likely to be minimal required data for a valid record.

Discussion on Variant Labels:

  • It is common for there to be multiple variant labels.  In some cases, variants may represent different languages.

  • If DELETE_LABEL is supported, is it ok for a variant label to be deleted and not have a corresponding add?  This seems ok. 

Examples:

  • LOC is looking at using a feed that provides information on authoritative labels.  This is mostly used for name changes (e.g. person died and the death date is added to the label).

Approach:

Minimally need to include:

  • URI
  • NEW_LABEL - the new label to use
  • PREDICATE (or some other identifier) - identifies which type of label is being replaced

NOTE: This can be represented as a triple.  <URI> <PREDICATE> "new label"@en


To be able to replace a label, also need:

  • OLD_LABEL - the value of the literal that is being replaced

NOTE: This would remove triple.  <URI> <PREDICATE> "old label"@en


OPTION 1:  Single type LABEL_CHANGE - all change information is in a single change entry

OPTION 2: Two change entries, one to DELETE_LABEL being replaced, followed by ADD_LABEL to add the new label.  Question: Will this provide an adequate indicator to downstream consumers allowing them to update cached values?

Example Data Stream:

For Option 1:

{ 
  "type": "LABEL_CHANGE",
  "URI": "https://uri.of.changing.entity",
  "PREDICATE": "skos:prefLabel",
  "NEW_LABEL": "new value"@en,
  "OLD_LABEL": "old value"@en 
}
{ 
  "type": "LABEL_CHANGE",
  "ADD": "https://uri.of.changing.entity",
  "PREDICATE": "skos:prefLabel",
  "NEW_LABEL": "new value"@en,
  "OLD_LABEL": "old value"@en 
}


For Option 2:

{ 
  "type": "DELETE_LABEL",
  "URI": "https://uri.of.changing.entity",
  "PREDICATE": "skos:prefLabel",
  "LABEL": "old value"@en 
}
{ 
  "type": "ADD_LABEL",
  "URI": "https://uri.of.changing.entity",
  "PREDICATE": "skos:prefLabel",
  "LABEL": "new value"@en,
}



Other Considerations and Questions

Are the following handled differently when managing change?

  • High velocity changes vs. Low velocity changes
    • labels typically change less frequently
  • High impact changes vs. Low impact changes
    • Importance of change


Notification when an initial search fails to match and later a match becomes available.  Same for partial match.

  • Null to actual value.
  • How to distinguish between a missing subject vs. a typo?
  • Notification of missed value to a specific user. This would likely be outside the change management stream.
  • Notification of submitted entities that were accepted and those that were not accepted. 
    • Ex. Cataloger adds record to OCLC, but it is not immediately available.  Get notification when it is available or rejected.
  • No labels