Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion items

TimeItemWhoNotes
 upper





arks-forum issue: upper case normalization of % encoding

Hello Mario, you bring up an issue that has not been noticed before.  Section 6.2.2.1 of RFC 3986 is talking about case normalization for the purpose of determining equivalence, but section 2.1 of the same gives a more general recommendation for using uppercase: "For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."

The ARKs In The Open project has a technical working group that is looking at changes to the ARK specification, and we will take up your suggestion as a topic.

I'm afraid I don't quite follow your suggestion that, e.g., "/" can be transformed to "_" to avoid encoding issues.  The premise of encoding is that the ARK producer desires or needs to use certain characters.  If the producer were forced to change the characters they use, there would be no need for an encoding mechanism at all.

-Greg



> On Feb 3, 2020, at 8:13 AM, John Kunze <jak@ucop.edu> wrote:
>
> Did you all get this? If not please make sure you're subscribed and its not going to spam.
>
> ---------- Forwarded message ---------
> From: Mario Xerxes Castelán Castro <marioxcc.MT@yandex.com>
> Date: Sat, Feb 1, 2020 at 5:27 PM
> Subject: [arks] Case of percent-encoded character
> To: ARKs <arks-forum@googlegroups.com>
>
>
> There exist a soft conflict between the ARK specification where lowercase percent-encoded characters and the URI specification (RFC 3986 §6.2.2.1) where uppercase percent-encoded characters are preferred. This is not a strict contradiction because both documents allow both lowercase and uppercase. However it leaves users that generates ARKs with percent-encoded characters and authors of software that generates them having to chose between contradictory recommendations.
>
> My personal suggestion is to change the ARK document to state that uppercase percent-coded characters are preferred to be consistent with the URI specification. The currently rationale for lowercase characters “Lower case hex digits are preferred to reduce the chances of false acronym recognition;” applies only when generating ARKs based on previously existing identifiers where some reserved characters appear in the previously existing identifier; in that case, another option is to encode the previously existing identifiers into the alphabet of unreserved ARK characters using some specific scheme that avoid percent-encoded chanters at all. For example, if the previously existing identifier is known to use the alphabet “0-9a-zA-Z/”, the character “/” can be transformed into “_”, thus avoiding %2f or %2F.
>



Action items

  •