Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 50-year-old assumption that identifiers are case-insensitive. You might also consider using the character repertoire of the Noid tool, which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm; it uses only digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (eg, innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's 131533174, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. A common approach is to map the ARK to the URL using a software table that you update whenever the URL changes. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

...

Namespaces and sub-namespaces work on the often-used principle that every time a prefix defines a namespace, it only needs to be "extended" (adding characters to the end of the prefix) to create a new sub-namespace containing an infinite number of possible ARKs. There is potentially a sub-namespace associated with every prefix.

Set of all ARKs startingNamespace definedExample ARK in that namespace
ark:/All ARKsark:/99999/fk4gt2m
ark:/12345/ARKs under the NAAN 12345ark:/12345/p987654
ark:/12345/x5ARKs under the 12345/x5 shoulderark:/12345/x5wf6789
ark:/12345/x5wf6789/ARKs under the 12345/x5wf6789 objectark:/12345/x5wf6789/c2/s4.pdf

Examples like those in the above table are quite common. The first is for all ARKs and the second is for all ARKs under ark:12345/ ARKs. The third is the shoulder concept, described below (the shoulder concept), which is the next subdivision under the NAAN. The fourth, a complete ARK-as-prefix example, shows that an object ARK itself is a namespace, with potentially an infinite number of "sub-ARKs" that descend from it to name object parts and variants

Can I make changes to a NAAN?

You can change the registry entry for a NAAN by filling out the same online form used for requesting a new NAAN. For security purposes requests are processed manually. Example reasons for a change may include

  • notifying N2T of a change in your organization's contact person or resolver URL,
  • updating your organization's name assignment policy (sample policy),
  • requesting an additional NAAN, eg, to support a significant new body of ARKs or new organizational division, and
  • transitioning your NAAN to another organization that will carry on your work and future use of your NAAN.

NAANs are portable. If your organization transitions into or out of a vendor relationship, there is no impediment to taking your NAAN with you.

...

. The practice of creating new namespaces by adding information to an existing namespace is very widely used and pre-dates the Internet (eg, a neighbor, who lives at flat #3B, receives mail at 1234 Main Street, #3B, Springfield, IL, USA).

Anchor
shoulder
shoulder
What is a shoulder?

Image Added

shoulder is a sub-NAAN namespace used to help organize it. It is the set all ARKs starting with a short, fixed sequence of characters after the NAAN. For example, in

ark:/12345/x5wf6789/c2/s4.pdf

the shoulder, can be designated by /x5 if the context is clear, but it is often best to use its fully qualified, globally unique designation of ark:/12345/x5. Note that while a shoulder starts

...

Image Removed

shoulder is a sub-NAAN namespace used to help organize it. It is the set all ARKs starting with a small, fixed sequence of characters after the NAAN. For example, in

ark:/12345/x5wf6789/c2/s4.pdf

the shoulder is referred to as /x5 or (the fully qualified designation) ark:/12345/x5. Note that while a shoulder starts with "/", it is not separated (not by "/" or nor by any other character) from the rest of the ARK. The term "shoulder" is borrowed from the locksmith profession, which understands sets of keys to be defined by fixed (unvarying) shoulders that precede the varying shapes ("blades") that end at the tip of the key.

Shoulders allow ARK assignment operations under Shoulders allow ARK assignment operations under a NAAN to be delegated to autonomous projects or divisions, just as NAANs do under the overall ARK namespace. Even if an organization initially plans only wants to use ARKs for only one project, plans may change, and if new . If other needs for ARKs arise later, setting up aside a new shoulder for each new project or division under its own shoulder makes it easy to ensure that autonomous assignment streams – present, past, or future – won't conflict with each other, thanks to non-overlapping namespaces. (Shoulders can also ease the namespace splitting problem.)

...

In other words, in an ARK such as ark:/12345/x5wf6789, why is the shoulder "/x5not followed by a "/"? According to ARK rules, if you had published

(please don't do this)     ark:/12345/x5/wf6789/c2/s4.pdf

...

Both are likely untrue, at least in any way that can be easily explained to a user. It may seem natural to add a "/" because it makes the shoulder boundary obvious to in-house ARK administrators, but they are specialists who can tolerate be inconvenienced by the non-obvious. Moreover, it It doesn't help the end user who is either uninterested in and confused by your internal operational boundaries, or so very interested that they may try to hold you to account for their inferences (eg, about consistent support levels across objects sharing the apparent containing object). Less transparency about administrative structure hides messy details and can save user-support time in the end.

In fact ARK administrators always know where the shoulder ends if , provided it was chosen using the "first-digit convention". A primordinal shoulder is a sequence of one or more "betanumeric" letters ending in a digit. This means the shoulder is all letters (often just one) after the NAAN up to and including the first digit encountered after the NAAN. An advantage of primordinal shoulders is that there is an infinite number of possible shoulders under any NAAN.

Are there restrictions on the use of NAANs?

Yes, it is important never to invent or use a NAAN that is not listed in the public registry. There are, however, two special NAANs that anyone can use:

  • 99999, for "test", "development", or experimental ARKs, and
  • 12345, for non-functional ARKs appearing in documentation.

) after the NAAN up to and including the first digit encountered after the NAAN. An advantage of primordinal shoulders is that there is an infinite number of possible shoulders under any NAAN.

Are there restrictions on the use of NAANs and shoulders?

Yes. First, it is important never to invent or use a NAAN that is not listed in the public NAAN registry. There are, however, four special NAANs that are for shared use:

  • 99999, for "test", "development", or experimental ARKs,
  • 12345, for non-functional ARKs appearing in documentation,
  • 99152, for controlled vocabulary (metadata) terms, and
  • 99166, for people, groups, and institutions as agents.

For people with enough training, it is easy to recognize ARKs with these NAANs (eg, to not worry about 99999 ARKs in broken link reports). These NAANs can be a useful guide dealing with objects having certain immutable properties. For example, despite providers' best efforts, test ARKs frequently "escape into the wild", where their potential to end up confusing users and link checkers is mitigated by the fixed semantics above.

To create ARKs that exploit these shared NAAN-based semantics without conflict, there needs to be a way to reserve namespaces under the NAANs, and that requires a public shoulder registry.

Can I make changes to a NAAN or a shared shoulder?

You can change the registry entry for a NAAN by filling out the same online form used for requesting a new NAAN. For security purposes requests are processed manually. Example reasons for a change may include

  • notifying N2T of a change in your organization's contact person or resolver URL,
  • updating your organization's name assignment policy (sample policy),
  • requesting an additional NAAN, eg, to support a significant new body of ARKs or new organizational division, and
  • transitioning your NAAN to another organization that will carry on your work and future use of your NAAN.

NAANs are portable. If your organization transitions into or out of a vendor relationship, there is no impediment to taking your NAAN with youFor people with enough training, it is easy to recognize and eliminate ARKs with these NAANs from broken link reports that have to be dealt with. Despite providers' best efforts, such ARKs frequently "escape into the wild", where they end up confusing users and link checkers.

ARKs and other identifiers

...