Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 50-year-old assumption that identifiers are case-insensitive. You might also consider using the character repertoire of the Noid tool, which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm; it uses only digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (eg, innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's 131533174, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. A common approach is to map the ARK to the URL using a software table that you update whenever the URL changes. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

 ARK ANATOMY                  
                        
      Resolver Service   Base Object Name    Qualifiers
     __________________  _________________  _____________
    /                  \/         ...     \/             \
    https://example.org/ark:/12345/x54xz321/s3/f8.05v.tiff
            \_________/ \__/ \___/ \______/\____/\_______/
                 |       |     |  ...  |     |       |
                 |     Label   |   |   | Sub-parts  Variants
                 |             |   |   |
 Name Mapping Authority (NMA)  |   |  Assigned Name      ...
                               |   +---------- Shoulder: /x5
                Name Assigning Authority Number (NAAN)

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

...

By obtaining a NAAN, an organization has gets the exclusive right to create ARKs using the NAAN as a kind of "prefix", in other words, all ARKs starting with it. That set of ARKs "under" that NAAN. Your NAAN is part of a prefix in front of all your ARKs. The set of ARKs you can create is infinite and is known as the your NAAN's namespace. It is also a subset , and your NAAN namespace is a sub-namespace (subset) of the ARK namespace (the set of all possible ARKs). For example, the Internet Archive's NAAN namespace is all ARKs starting with "ark:/13960/". These NAAN-based prefixes, NAANs effectively subdivide the ARK namespace into non-overlapping sub-namespaces, each one holding an infinite number of possible ARKs. Since organizations can only create ARKs in their own namespaces, there can be no conflicting it is impossible for ARK assignments between organizations to conflict.

The NAAN also plays NAANs play a key role in resolution. For example, if the N2T.net resolver a resolver cannot find an incoming ARK in its database, it looks at its the incoming NAAN and redirects the ARK to the local resolver registered with the NAAN. Similarly, a local resolver may receive incoming ARKs (presumably not from N2T) with NAANs This is precisely what the N2T.net resolver does. Any local resolver could be configured to return the favor for any incoming ARKs with NAANs that it doesn't know about and may choose to redirect them , simply by redirecting them to N2T.

Speaking of namespaces, in principle there is a sub-namespace associated with every prefix, even very long ones. For example, the full ARK for any object can be viewed as a prefix, with an infinite number of ARKs – naming object parts and variants – that can descend from it. In practice, the two most common prefix-based sub-NAAN namespaces are associated with objects and with something called "shoulders" (below)Namespaces and sub-namespaces work on the often-used principle that every time a prefix defines a namespace, it only needs to be "extended" (adding characters to the end of the prefix) to create a new sub-namespace containing an infinite number of possible ARKs. There is potentially a sub-namespace associated with every prefix.

Set of all ARKs startingNamespace definedExample ARK in that namespace
ark:/All ARKsark:/99999/fk4gt2m
ark:/12345/Namespace of ARKs under the NAAN 12345ark:/12345/p987654
ark:/12345/x5Namespace of ARKs under the 12345/x5 shoulderark:/12345/x5wf6789
ark:/12345/x5wf6789/Namespace of ARKs under the 12345/x5wf6789 objectark:/12345/x5wf6789/c2/s4.pdf/c2/s4.pdf

Examples like those in the above table are quite common. The first is for all ARKs and the second for all ark:12345/ ARKs. The third is described below (the shoulder concept). The fourth, a complete ARK-as-prefix example, shows that an object ARK itself is a namespace, with potentially an infinite number of "sub-ARKs" that descend from it to name object parts and variants. 

Can I make changes to a NAAN?

You can change the registry entry for a NAAN by filling out the same online form used for requesting a new NAAN. For security purposes requests are processed manually. Example reasons for a change may include

  • notifying N2T that  of a change in your organization's contact person or resolver URL will change,
  • updating your organization's name assignment policy (sample policy),
  • requesting an additional NAAN for , eg, to support a significant new body of ARKs or new organizational division, and
  • transitioning your NAAN to another organization that will carry on your work and take over future use of your NAAN.

By the way, if NAANs are portable. If your organization transitions into or out of a vendor relationship, there is no impediment to taking your NAAN with you.

Anchor
shoulder
shoulder
What is a shoulder?

shoulder is a sub-NAAN namespace commonly used to help keep it organized. It is the set all ARKs  used to help organize it. It is the set all ARKs starting with a small, fixed sequence of characters after the NAAN. For example, in

ark:/12345/x5wf6789/c2/s4.pdf

the shoulder is referred to as /x5 or (the fully qualified designation) ark:/12345/x5. Note that while a shoulder starts with "/", it is not separated (by "/" or by any other character) from the rest of the ARK.

starting with a fixed prefix that adds a few characters after the NAAN, and unlike the NAAN it is not terminated by a "/". Shoulders allow ARK assignment operations in under a NAAN namespace to be delegated to autonomous projects or divisions, just as NAANs do for under the overall ARK namespace. Even if an organization has but one project at first, it rarely knows how it will want to use its namespace in the years ahead. Setting up each project initially plans to use ARKs for only one project, plans may change, and if new needs for ARKs arise later, setting up a new project or division under its own shoulder makes it impossible for any project's assignments to conflict with those of another – presenteasy to ensure that autonomous assignment streams – present, past, or future – because they take place in nonwon't conflict each other, thanks to non-overlapping namespaces. (Shoulders can also ease the namespace splitting problem.

In the ARK string, a shoulder is the NAAN, a "/", then usually a letter and a digit. For example, the shoulder is "/x5" in:

...

)

Why is there no "/" to mark the end of a shoulder?

...