You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Problem: existing ARK inflections ? and ?? have not been adopted widely, for reasons that include

  • It can be hard for servers to detect a terminal '?' as different from the absence of a query string. It is in fact impossible in Tomcat, and requires rewrite rules in Apache.
    • unlike '...??' (legal URL), '...?' is not a "legal" URL, so software libraries don't pass it through
  • Although '?' is intuitive and language-agnostic, it can also be puzzling to some people.
  • The metadata to be returned was only vaguely defined (mostly by example).
  • The metadata syntax (ANVL) was non-standard and largely defined by example.

Proposed solution discussions, in reverse chronological order

2019.11.30 draft "info" inflection specification

The "info inflection" is a string, "?info", that may be added to an ARK before resolving it in order to request, not the identified object, but the return of human- and machine-readable metadata describing the object and the commitment made to it by its provider.

For the sake of discussion, we assume that resolution of a given ARK may be a multi-stage process starting with the resolver (HTTP server) appearing in the URL form of the ARK when it is submitted for resolution. That first resolver may actually return content directly, making resolution a one-stage process. If not, the first resolver forwards to a second resolver, which may in turn forward to another, and so forth. The last resolver is the HTTP server that returns content without forwarding. The first resolver may in some cases be the last resolver.

The response may take several forms:

  1. Unsupported: THUMP header xxx "unsupported inflection"
  2. THUMP header

The result of resolving an ARK without an inflection is indistinguishable from an HTTP response from the last resolver.


causes the followingreturns HTML with embedded JSON-LD
a) embedded GeoJSON, which allows foreign members from JSON-LD
  why? because of high integration with widespread tools, like google search and instant map integration is visually powerful
b) embedded HTML meta tags
  why? because not everyone is extracting JSON-LD tags
c) metadata elements formatted for human reading per provider preference

<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Dataset",
"@id": "https://n2t.net/ark:/12345/x408001.v2",

"who": "National Cancer Institute; ICPSR - Interuniversity Consortium for Political and Social Research",
"what": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"when": "1984-05-03",
"where": "https://n2t.net/ark:/12345/x408001.v2",
"how": "(:mtype data) Dataset",

"kids": [
"https://n2t.net/ark:/12345/x408001.v2/file.xsl",
"https://n2t.net/ark:/12345/x408001.v2/file.csv",
"https://n2t.net/ark:/12345/x408001.v2/file.pdf"
],
"parent": "https://n2t.net/ark:/12345/x408001",
"cite-as": "https://n2t.net/ark:/12345/x408001.v2",
"stickiness": [
"_see: https://datascience.codata.org/articles/10.5334/dsj-2017-039/",
"indefinite", "keeping", "intraversioned", "standard", "NR", "OP"
],

"name": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"author": "National Cancer Institute",
"publisher": "ICPSR - Interuniversity Consortium for Political and Social Research",
"datePublished": "1984-05-03",
"dateModified": "2015-08-06T11:20:58Z",
"version": "v2",
"Description": "This dataset was produced as part of the Surveillance, Epidemiology, and End Results (SEER) Program to monitor the incidence of cancer and cancer survival rates in the United States, thus carrying out the mandates of the National Cancer Act. The SEER Program had several objectives: to estimate the annual cancer incidence in the United States, to examine trends in cancer patient survival, to identify cancer etiologic factors, and to monitor trends in the incidence of cancer in selected geographic areas with respect to demographic and social characteristics..."}
</script>

<!-- why? because not everyone recognizes JSON script metadata -->
<meta name="DC.identifier" content="ark:/12345/x408001.v2" scheme="DCTERMS.URI"/>
<meta name="DC.title" content="Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977"/>
<meta name="DC.creator" content="National Cancer Institute"/>
<meta name="DC.publisher" content="ICPSR - Interuniversity Consortium for Political and Social Research"/>
<meta name="DC.date" content="1984-05-03" scheme="DCTERMS.W3CDTF"/>
<meta name="DC.type" content="Dataset"/>


2019.11.26 strawdog JSON

Returns HTML with
a) embedded GeoJSON, which allows foreign members from JSON-LD
  why? because of high integration with widespread tools, like google search and instant map integration is visually powerful
b) embedded HTML meta tags
  why? because not everyone is extracting JSON-LD tags
c) metadata elements formatted for human reading per provider preference

<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Dataset",
"@id": "https://n2t.net/ark:/12345/x408001.v2",

"who": "National Cancer Institute; ICPSR - Interuniversity Consortium for Political and Social Research",
"what": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"when": "1984-05-03",
"where": "https://n2t.net/ark:/12345/x408001.v2",
"how": "(:mtype data) Dataset",

"kids": [
"https://n2t.net/ark:/12345/x408001.v2/file.xsl",
"https://n2t.net/ark:/12345/x408001.v2/file.csv",
"https://n2t.net/ark:/12345/x408001.v2/file.pdf"
],
"parent": "https://n2t.net/ark:/12345/x408001",
"cite-as": "https://n2t.net/ark:/12345/x408001.v2",
"stickiness": [
"_see: https://datascience.codata.org/articles/10.5334/dsj-2017-039/",
"indefinite", "keeping", "intraversioned", "standard", "NR", "OP"
],

"name": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"author": "National Cancer Institute",
"publisher": "ICPSR - Interuniversity Consortium for Political and Social Research",
"datePublished": "1984-05-03",
"dateModified": "2015-08-06T11:20:58Z",
"version": "v2",
"Description": "This dataset was produced as part of the Surveillance, Epidemiology, and End Results (SEER) Program to monitor the incidence of cancer and cancer survival rates in the United States, thus carrying out the mandates of the National Cancer Act. The SEER Program had several objectives: to estimate the annual cancer incidence in the United States, to examine trends in cancer patient survival, to identify cancer etiologic factors, and to monitor trends in the incidence of cancer in selected geographic areas with respect to demographic and social characteristics..."}
</script>

<!-- why? because not everyone recognizes JSON script metadata -->
<meta name="DC.identifier" content="ark:/12345/x408001.v2" scheme="DCTERMS.URI"/>
<meta name="DC.title" content="Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977"/>
<meta name="DC.creator" content="National Cancer Institute"/>
<meta name="DC.publisher" content="ICPSR - Interuniversity Consortium for Political and Social Research"/>
<meta name="DC.date" content="1984-05-03" scheme="DCTERMS.W3CDTF"/>
<meta name="DC.type" content="Dataset"/>



2019.11.04 a different proposal for the new ?info inflection

Proposed: for any ARK XX?info should lead to an HTML-formatted "landing" document (page) with metadata embedded as JSON-LD. The metadata, in human- and machine-readable form, includes

  1. The ARK X
  2. Descriptive metadata:
    1. who
    2. what
    3. when
    4. where
    5. how (metatype, similar to resourcetype)
    6. domain-specific elements (eg, publications vs physical samples vs vocabulary terms)
  3. PIDs to first-level variants (versions, formats, change history) and components of X, if any
  4. PID to the first (immediate) logical ancestor of X
    1. eg, if X is a PDF variant of a document object, this points to the logical object ARK listing X along with its sibling HTML and MSWord forms
  5. PID to the last (root) logical ancestor of X
    1. eg, if X is a section of a chapter of a book, this points to the book logical object
  6. Change history, if any
  7. Licensing and accessibility information
  8. How to cite, including "cite-as" header
  9. Persistence statement

A great example to follow would be the A data citation roadmap for scholarly data repositories.

2019.09.16 proposal for a new, explicit word-based inflection: ?info

  • ?info requests metadata
  • ?info required, but spec continues to reserve '?' and '??' as optional synonyms 
  • ?info requests anvl/erc, but the spec permits (as always) alternate formats
    • continues to use THUMP conventions with parenthesized args
    • ?info equivalent to ?info()

This is a small adjustment to the spec that doesn't quite specify how to request alternate formats, but cracks open the door to work that we can complete, not in the spec, but in the AITO context. An example of that might be the THUMP request:
                  ?info()as(application/json)

2019.08.05 more discussion of collapsing existing ? and ?? into just ??

2019.07.15 Proposed: suppress '?' inflection (let it be optional), leaving just the '??' inflection

  • as before, '??' requests kernel elements plus any persistence statement
  • '??' easier to implement than '?' (the latter being impossible to detect in Tomcat)
  • '?' may be supported by older implementations (briefer record)
  •    ... or should '?' be made identical to '??'  ?


  • No labels