Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

...

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 1960's-era view that identifiers are case-insensitive (one sign of which is the prominence of the Caps Lock key on most computer keyboards).

Alphanumeric characters (letters and digits) are generally adequate, but it is recommended to use the betanumeric subset, consisting only of digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

This happens to be the repertoire produced from minters (unique string generators) supported by the Noid tool and N2T.net (used by ezid.cdlib.org and the Internet Archive), which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm. When generating unique strings automatically, the absence of vowels helps avoid accidentally creating words that users can misconstrue.

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (eg, innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's metadata, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. A common approach is to map the ARK to the URL using a software table that you update whenever the URL changes. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

...

There are different ways to implement a shoulder. Fundamentally, a shoulder is a deliberate practice that is created from based on a decision you make to assign ARKs that start with a particular extension to your NAAN. A shoulder can then "emerge" as ARKs are assigned according to practices that have consciously reserved and used that extensionyour practices consciously observe and incorporate that extension as a prefix in ARKs that you create.

Having said that, there are a couple of cases where shoulder implementation can does involve a kind of "creation" step. A system such as ezid.cdlib.org supports both the kinds of purely "userdecision-based" shoulders above (that emerge from user practice, eg, Smithsonian) , but it also supports as well as the creation of system-recognized shoulders with accompanying minter services and registered API access points. In contrast, implementing a userdecision-based shoulder requires no explicit shoulder creation step, but does involve the creation of one or more ARKs that start with that shoulder.  In another case, to implement a shoulder under one of a handful of shared NAANs (below).As a special case that works with the N2T.net resolver, it is possible to create a short ARK, such as ark:/99152/p0, that is an identifier even though it looks and acts like a shoulder due to suffix passthrough.

A completely different kind of shoulder "creation" step is needed to implement a shoulder under one of the few shared NAANs (below).

Are Are there restrictions on the use of NAANs and shoulders?

...

Each shared NAAN has certain immutable connotations that software (and people with enough training) can recognize and benefit from.

Shared NAAN
NAAN purpose

Purpose, meaning, or connotation of ARKs with this NAAN

Expect to resolve?OK for long term reference?

12345 examples

Example ARKs appearing in documentation. They might resolve, but no link checker need be concerned if they don't. They should never be used for long term reference.maybeno

99152 terms

ARKs for controlled vocabulary
or
and ontology terms, such as metadata element names and pick-list values. They should resolve to term definitions and are suitable for long term reference.yesyes

99166 agents

ARKs for people, groups, and institutions as "agents". They should resolve to agent definitions and are suitable for long term reference.yesyes

99999 test

ARKs for test, development, or experimental purposes, often at scale. They might resolve, but no link checker need be concerned if they don't. They should never be used for long term reference.

maybeno

Especially The 99999 and 12345 ARKs are especially useful if you are responsible for reviewing broken link reports , 99999 and 12345 ARKs can because they can all safely be ignored. Despite providers' best efforts, these test and example ARKs frequently "escape into the wild" for all to see. So the fixed semantics of these NAANs can mitigate their potential to end up confusing receiving users and link checkers. While both NAANs may have ARKs that do actually resolve, some for a long time, 99999 ARKs are meant for test use, often at scale, and 12345 ARKs are meant for small-scale use as examples in documentationRecipients (eg, people and link checkers) that would normally be concerned with broken links have only to recognize these two special NAANs in order to not become distracted by such ARKs.

Is there a quick way to

...

get started creating test ARKs

...

?

Yes. Instead of reserving a

...

Yes, there's 99999 shoulder, if your organization already has its own NAAN, you can immediately create and use a "quick start" way to create test ARKs that is available to any organization with its own NAAN. test ARK". This is an ARK that starts with ark:/99999/9NNNNN_, where NNNNN represents the NAAN (preceded by '9' and followed by '_'). There is no need to register a quick test namespace since it is automatically set aside for each NAAN. As with any prefix, there is an infinite number of possible test ARKs that start with each NAAN's quick test namespace. Two versions of an example quick test ARK belonging to the BnF (NAAN 12148) are

https://ark.bnf.fr/ark:/99999/912148_testxyz

   https://n2t.net/ark:/99999/912148_testxyz

Note that N2T.net is configured to forward any quick test ARK it receives (second version above) to the appropriate local resolver (first version).

Can I make changes to a NAAN or a shared shoulder?

...