Page History

...

02 Mar 2020

Attendees

Discussion items

Time	Item	Who	Notes

arks-forum issue: upper case normalization of % encoding

Hello Mario, you bring up an issue that has not been noticed before. Section 6.2.2.1 of RFC 3986 is talking about case normalization for the purpose of determining equivalence, but section 2.1 of the same gives a more general recommendation for using uppercase: "For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."
The ARKs In The Open project has a technical working group that is looking at changes to the ARK specification, and we will take up your suggestion as a topic.
I'm afraid I don't quite follow your suggestion that, e.g., "/" can be transformed to "_" to avoid encoding issues. The premise of encoding is that the ARK producer desires or needs to use certain characters. If the producer were forced to change the characters they use, there would be no need for an encoding mechanism at all.
-Greg> On Feb 3, 2020, at 8:13 AM, John Kunze <jak@ucop.edu> wrote:
>
> Did you all get this? If not please make sure you're subscribed and its not going to spam.
>
> ---------- Forwarded message ---------
> From: Mario Xerxes Castelán Castro <marioxcc.MT@yandex.com>
> Date: Sat, Feb 1, 2020 at 5:27 PM
> Subject: [arks] Case of percent-encoded character
> To: ARKs <arks-forum@googlegroups.com>
>
>
> There exist a soft conflict between the ARK specification where lowercase percent-encoded characters and the URI specification (RFC 3986 §6.2.2.1) where uppercase percent-encoded characters are preferred. This is not a strict contradiction because both documents allow both lowercase and uppercase. However it leaves users that generates ARKs with percent-encoded characters and authors of software that generates them having to chose between contradictory recommendations.
>
> My personal suggestion is to change the ARK document to state that uppercase percent-coded characters are preferred to be consistent with the URI specification. The currently rationale for lowercase characters “Lower case hex digits are preferred to reduce the chances of false acronym recognition;” applies only when generating ARKs based on previously existing identifiers where some reserved characters appear in the previously existing identifier; in that case, another option is to encode the previously existing identifiers into the alphabet of unreserved ARK characters using some specific scheme that avoid percent-encoded chanters at all. For example, if the previously existing identifier is known to use the alphabet “0-9a-zA-Z/”, the character “/” can be transformed into “_”, thus avoiding %2f or %2F.
>

Action items

announcements

new NAAN Registry working group should take some pressure off this group

CNI to have a panel on AITO

intro to version 2 of Networked Entity Model

GJ: how do we tell which resolver should deal with inflections
TC: what is the purpose of this Nentity Model?
JK: to clean up the mess and confusion of the diversity of networked objects that we all deal with
TC: unique spelling may not work, using regular words might be better; it would help searchability if keywords appeared reliably and frequently as JSON keywords
CM: consider more normal spellings and making new words upper case to make them stand out more
GJ: I have trouble with the odd spellings
CM: they're hard to remember
KH: I also find spellings hard, but the new concepts could be helpful
TC: this is quite relevant to what we do with phys/dig objects, and transcriptions; relations are probably covered in specs like linkrel
MP: we should look into overlap with other initiatives, eg, OAI, IIIF, signposting, Portland common data model

Action items

All: suggest alternative spellings instead of the odd spellings
Mark Phillips will investigate overlap with things like OAI, IIIF, signposting, Portland common data model

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Attendees

Discussion items

Action items

Action items