Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It can be hard for servers to detect a terminal '?' as different from the absence of a query string. It is in fact impossible in Tomcat, and requires rewrite rules in Apache.
    • unlike '...??' (legal URL), '...?' is not a "legal" URL, so software libraries don't pass it through
  • Although '?' is intuitive and language-agnostic, it can also be puzzling to some people.
  • The metadata to be returned was only vaguely defined (mostly by example).
  • The metadata syntax (ANVL) was non-standard and largely defined by example.

Proposed solution discussions, in reverse chronological order

2019.11.30 Draft Inflection Spec: "?info"

Requirements and Desiderata

Karen/Bertrand

  1. At minimum, ?info must resolve to a human readable landing page, and should provide a gateway to machine-readable metadata
  2. It is strongly recommended that meta tags with [something like] DC are implemented (I’m suggesting this since they are simple html, and all orgs should be able to do something with those)
  3. Secondary to this,  organizations are encouraged to use whatever data format[s] is appropriate in their context as the machine-readable data version of ?info, but encourage that organizations:
    1. utilize an established metadata standard (like DC) where possible
    2. utilize an established serialization for their metadata such as XML, JSON, or an RDF serialization such as JSON-LD or Turtle.
    3. express the document type via the “Content-Type:” HTTP header.
    4. utilize either content negotiation or queries in the form “&format=[json|xml]” property to deal with alternative formats.

    Karen: I added c) as a suggestion. I don’t know if you want to indicate a preferred serialization/standard beyond this, or specify minimal metadata fields (the who, what, etc.), or keep it very loose. We could then provide examples that lay out different flavors that are acceptable – I would be willing to contribute an example. 

John

  1. Some continuity with past
    1. human-readable metadata returned
    2. machine-readable metadata returned
    3. including persistence statements
    4. who/what/when/where paradigm (ERC)
    5. THUMP-like request protocol -- ?info(X,Y) vs ?info&arg1=X&arg2=Y
  2. Never RDF
    1. unfortunately, JSON-LD is RDF; see tweet https://twitter.com/justin_littman/status/1206944465027584001
    2. however, widely used schema.org borrows elements names from JSON-LD and uses them in meta tags, which aren't at risk of RDF complexity

Proposed solution discussions, in reverse chronological order

2019.12.15 Draft Inflection Spec: "?info"

The info inflection is a string, "?info", that may be added to an ARK before resolving it in order to request the return of human- and machine-readable metadata describing the identified object and the commitment made to it by The info inflection is a string, "?info", that may be added to an ARK before resolving it in order to request the return of human- and machine-readable metadata describing the identified object and the commitment made to it by its provider. A successful response returns metadata content as HTML intended for human consumption, along with embedded JSON-LD intended for machine consumption. Future extensions are expected that will permit the request and return of alternate formats. Embedded HTML meta tags that repeat some of the metadata using schema.org element names are recommended because not all processors recognize JSON-LD metadata.with embedded JSON intended for machine consumption. Future extensions are expected that will permit the request and return of alternate formats. Embedded HTML meta tags that repeat some of the metadata using schema.org element names are recommended because not all processors recognize JSON metadata. It is acceptable in the short term also to recognize the older "?" and "??" inflections and to treat them as synonymous with "?info", but their behavior may change in future versions of the ARK specification.

For the sake of discussion, we define some new terms. Resolution of a given ARK (or any URL) may be a multi-stage process starting with the first resolver hostname appearing in the URL form of the ARK when it is submitted for resolution. That . Examples are n2t.net and ark.bnf.fr. The first resolver may forward (HTTP redirect) to a second resolver, which may in turn forward to another, and so forth. The content resolver is the HTTP server that normally returns object content directly (ie, without forwarding). The metadata resolver is the HTTP server that, in response to the info inflection, returns metadata content directly. For a given ARK, the metadata resolver may be on a different host from the content resolver. (On the other hand, all three resolvers might also be on the same host.) For example, the N2T.net resolver stores a preservation copy of object metadata and can be configured on a per-ARK basis to respond to the info inflection directly or to forward it.

The object metadata returned in response to the info inflection depends not only on the object's immediate descriptive attributes but also on the object type and its place in a constituent cluster. For example, an ARK identifying a published article could have immediate attributes such as author (who), title (what), and date (when), but also, because it is a publication, additional core attributes such as publisher and length (number of pages). The article, one of eight in a particular issue of a journal, might also have multiple versions, in multiple formats, and might contain logical parts such as Abstract, Article, Appendices, and References. These represent its constituent cluster, which is a set of objects with which a given object has any of the following relationships: hasPart, isPartOf, isSiblingOf, HasFormat, hasVersion. For example, our article isPartOf a journal issue, hasPart References, and hasFormat(s) PDF and HTML. Because of its place in the cluster, the article's metadata should contain a link to the issue of which it is part. Note that link Link relationships within the constituent cluster exist independent of whether they are ARKs or whether they are ARKs that use the reserved '/' and '.' characters.

<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Dataset",
"@id": "https://n2t.net/ark:/12345/x408001.v2",
"_comment": "The next 5 elements are for very broad cross-domain interoperation.",
"who": "National Cancer Institute; ICPSR - Interuniversity Consortium for Political and Social Research",
"what": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"when": "1984-05-03",
"where": "https://n2t.net/ark:/12345/x408001.v2",
"how": "(:mtype data) Dataset",
"hasPart": [
"https://n2t.net/ark:/12345/x408001.v2/file.xsl",
"id_requested": "...",
"id_normalized": "...",
"id_surrogate": "...", # (optional) different id for digital surrogate, implies requested id is about physical object
"id_up1": "...", # (optional) different id for 1st interesting landing page "above" this level
"id_up2": "...", # (optional) different id for 2nd interesting landing page "above" the up1 level
"report": {
"_comment": "The next 5 elements are for very broad cross-domain interoperation.",
"who": "National Cancer Institute; ICPSR - Interuniversity Consortium for Political and Social Research",
"what": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"when": "1984-05-03",
"where": "https://n2t.net/ark:/12345/x408001.v2/file.csv",
"https://n2t.net/ark:/12345/x408001.v2/file.pdf"
],
"parent": "https://n2t.net/ark:/12345/x408001",
"cite-as": "https://n2t.net/ark:/12345/x408001.v2",
"stickiness": [
"_comment": "for these terms, see https://datascience.codata.org/articles/10.5334/dsj-2017-039/",
"indefinite", "keepinghow": "(:mtype data) Dataset",
"thumbnail": "...",

"_comment": "metatype-dependent core",
...,

"persistence": {
"object": [ "indefinite", "standard" ],
"content": [ "keeping", "waxing" ],
"identifier": [ "single_use", "opaque", "intraversioned", "standard", "NR", "OP"
],
"unbranded" ],
"provider": [ "mission", "nonprofit" ]
},
"cite-as": "https://n2t.net/ark:/12345/x408001.v2",

"_comment": "The next 6 elements are for interoperation within the 'data' domain.domain-dependent metadata",
"name": "Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977",
"producer": "National Cancer Institute",
"archive": "ICPSR - Interuniversity Consortium for Political and Social Research",
"datePublished": "1984-05-03",
"dateModified": "2015-08-06T11:20:58Z",
"version": "v2",
}
</script>

<!-- why? because not everyone recognizes JSON script metadata -->
<meta name="DC.identifier" content="ark:/12345/x408001.v2" scheme="DCTERMS.URI"/>
<meta name="DC.title" content="Cancer Surveillance and Epidemiology in the United States and Puerto Rico, 1973–1977"/>
<meta name="DC.creator" content="National Cancer Institute"/>
<meta name="DC.publisher" content="ICPSR - Interuniversity Consortium for Political and Social Research"/>
<meta name="DC.date" content="1984-05-03" scheme="DCTERMS.W3CDTF"/>
<meta name="DC.type" content="Dataset"/>

...