You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Here you find documentation on analysis comparing and contrasting data patterns found in the SVDE PCC data with decisions made in the Sinopia PCC Templates in March 2021.

Context and Caveats on the Analysis

  1. A spreadsheet was programmatically created by Stanford from the Sinopia PCC Templates according to the DCMI Tabular Application Profile specification. A copy of the spreadsheet was created titled March 2021 Comparison of PCC Sinopia Templates and SVDE Data to evaluate row by row if PCC SVDE data has triples conforming to the template. Note, no systematic attempt has been made to see if every pattern in SVDE is captured in the PCC templates.
  2. Every attempt was made to account for different patterns for bf:Works in the SVDE data, but the patterns may or may not be too different to reconcile neatly with the Sinopia PCC Works patterns.
  3. The bflc:Relationship pattern to capture relationships (between Works and other works, and Instances and other Instances) in the templates and in the SVDE data are not intuitive. There should be further analysis to make sure the bflc:Relationship pattern is used consistently under similar circumstances.
  4. The use of bf:issuance is not consistently applied for Serials, and perhaps Monographs. We need to clear up what pattern the community should use to confidently signal one or the other; the PCC templates use http://id.loc.gov/vocabulary/issuance/mono and http://id.loc.gov/vocabulary/issuance/serl for monographs and serial respectively. In some cases, this made it difficult to confidently say the SVDE data complied.
  5. Analysis of the SVDE PCC data was performed using SPARQL queries on the LD4P cache available at: http://services.ld4l.org/fuseki/dataset.html?tab=query&ds=/PCC


General Comment/Questions

There are significant differences between the Sinopia PCC templates and SVDE data. In March 2021 Comparison of PCC Sinopia Templates and SVDE Data, Column L (with the header "SVDE Differences") captures compatibility and differences between the data and templates. Areas of compatibility are highlighted green, and differences are highlighted yellow; it may be that not all differences require reconciliation. We need to clearly define what actions we expect to be able to performed on the SVDE data in the Sinopia environment and vice versa. Depending on the answers to the following questions, we may need to consider changes to the tooling, data shapes, and/or templates.

  1. Are we only committed to deriving new Sinopia descriptions from SVDE data?
    1. If so, is it ok to only map in the SVDE data that aligns with the Sinopia templates? Data that doesn't map neatly could also be returned and visible to the cataloger so that values can be copied and pasted to applicable Sinopia fields.
      1. Alternatively, data that doesn't map neatly could be transformed to the Sinopia template shape where possible, but this would require QA and other lookup tools to understand both the source data and the template shapes. This would be a remarkable amount of work.
  2. Do we expect to be able to edit SVDE data from within Sinopia?
    1. If so, is it ok if not all parts of the SVDE data is editable from within Sinopia? Are we ok with open shapes ("extra" data beyond what the PCC templates address)?
    2. If so, (and assuming changes will be sent back to SVDE) is it ok if Sinopia templates afford additional patterns not represented in the SVDE data? Is SVDE ok with open shapes ("extra" data beyond what their tooling may be set up to interpret)?
  3. Are there interactions 


More Broadly

In the absence of community shared shapes, it is understandable that tools attempting to consume/edit/link to/derive from external data sources will struggle to account for differences in shape assumptions. The attempt to consume SVDE data in Sinopia according to a first pass at PCC templates is a great opportunity to test both the templates and modeling decisions being made in the SVDE community. Ideally, PCC's nascent attempt to create and maintain a PCC MAP will be informed by this learning opportunity. As the templates are further vetted, changes can be made through official processes, and communicated through official channels.

Further, SVDE intends to make changes to their data based on feedback from the SVDE community this summer. Another round of analysis will need to be performed when those changes are made. Rather than relying on bespoke SPARQL queries that require human interpretation for this analysis, resources should be allocated to develop more expertise in validation and reporting using ShEx or SHACL. We should consider whether Justin Littman's (Stanford) proof-of-concept Sinopia RDF validator using DCTAP profiles is fit for this purpose, or if other tooling is needed. The PCC, if committed to a spreadsheet representation of MAPs, may want to consider conforming to the DCTAP specification to more easily generate ShEx and/or SHACL for validation.

  • No labels