Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

...

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 1960s-era view that identifiers are case-insensitive (one sign of which is the prominence of the Caps Lock key on most computer keyboards).

Alphanumeric characters (letters and digits) are generally adequate, but it is recommended to use the betanumeric subset, consisting only of digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

This happens to be the repertoire produced from minters (unique string generators) supported by the Noid tool and N2T.net (used by ezid.cdlib.org and the Internet Archive), which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm. When generating unique strings automatically, the absence of vowels helps avoid accidentally creating words that users can misconstrue.

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (e.g., innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's metadata, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings somehow address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. The term "map" here refers to a generic web server software process that associates the incoming URL with content such as a particular file or a database entry. The process varies greatly across servers, but can be thought of abstractly as a lookup in a two-column table: column 1 for each incoming URL and column 2 for the corresponding file, database entry, or another URL.

Unfortunately, this mapping table description is abstract because the details depend on your web server software. On the other hand, the idea of mapping is very basic to how the web has worked since the 1990's, so doing your own resolution is quite feasible. For example, most server configuration files can easily accommodate 100,000 mapping table rows with lines that look like "Redirect <incoming ARK> <URL on this or other server>" (columns 1 and 2, after you replace what is in <>'s). A common approach with ARKs is to map each incoming ARK (column 1) to the kind of URL that your web server already knows how to deal with, and you are done. With this approach, to keep the ARKs in column 1 stable you only need to keep the URLs in column 2 updated when they change. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

...

The main purpose is to prevent assignment conflicts. By obtaining a NAAN, an organization gets the exclusive right to create ARKs "under" that NAAN. Your NAAN , which is part of a prefix in front of all your ARKs. The set of ARKs you can create is infinite and is known as your NAAN's namespace, and your NAAN namespace is a sub-namespace (subset) of the ARK namespace (the set of all possible ARKs). For example, the Internet Archive's NAAN namespace is all ARKs starting with "ark:/13960/". NAANs effectively subdivide the ARK namespace into non-overlapping sub-namespaces, each one holding an infinite number of possible ARKs. Since organizations only create ARKs in their own namespaces, ARK assignments between organizations will never "collide".

NAANs also play a key role in resolution. For example, if a resolver if the N2T.net resolver cannot find an incoming ARK in its database, it looks at the incoming NAAN and redirects the ARK to the local resolver registered with the NAAN. This is precisely what the N2T. net resolver does. Any local resolver could be configured to return the favor for incoming ARKs with containing NAANs that it doesn't know about, simply by redirecting them to N2T.

...

They work much the same way that all namespaces work. Whenever a name prefix is associated with a namespace, it only needs to be "extended" (adding characters to the end of the prefix) to start talking about a new sub-namespace (directly under it) associated with the extended prefix. If the extended prefixes don't conflict, nor will the names in the associated namespaces. There is a namespace associated with any prefix you can think of, and each with a potentially infinite number of names (ARKs) that start with it.

Set of all ARKs startingNamespace definedAssociated namespaceExample ARK in that namespace
ark:/All ARKsark:/99999/fk4gt2m
ark:/12345/ARKs under the NAAN 12345ark:/12345/p987654
ark:/12345/x5 ARKs under the 12345/x5 shoulderark:/12345/x5wf6789
ark:/12345/x5wf6789/ARKs under the 12345/x5wf6789 objectark:/12345/x5wf6789/c2/s4.pdf

The above table shows examples of four common namespace/sub-namespace levels. The first is for all ARKs and the second is for all ARKs under ark:12345. The third is the shoulder concept, described below, which is the next subdivision under the NAAN; note that it has no "/" after it.

The fourth, a complete ARK-as-prefix example, shows that an object ARK is itself also a namespace, with an infinite number of "sub-ARKs" that could descend from it to name object parts and variants. Creating new namespaces to avoid naming conflicts is an ancient practice. For example, a family may refer to someone as Sam, the community as Sam Smith, its the government as Sam Smith, 4321 Main Street, Springfield, and history as Sam Smith, 4321 Main Street, Springfield, 1888-1997.

...

the shoulder, /x5, extends the NAAN, 12345. The short designation, /x5, isn't very unique, so it is often best to use its fully qualified, globally unique designation of ark:/12345/x5. In the classic namespace tradition, the shoulder is the set of all possible ARKs starting with the long shoulder name. Our use of this term is borrowed from locksmithing, which understands sets of keys to be defined by fixed (, unvarying ) "shoulders" that precede the varying shapes ( "blades" (shapes that differ among keys sharing the same shoulder) that follow it.

Shoulders help organize a NAAN namespace for the long term. Just because a namespace contains an infinite number of possible ARKs does not mean that finding an unassigned ARK is easy, especially when over time there are – or were, or may be – different independent ARK assignment operations under it. Just as the ARK community sets aside organizations' NAAN namespaces, each organization is encouraged to set aside shoulder sub-namespaces. If you don't use shoulders from the beginning, even for one simple stream of assignments, you risk creating a mild but permanent chaos in your NAAN namespace, and you may end up requesting an additional NAAN (which is discouraged) for future assignment streams.

...

A shoulder is analogous to a guest room in your house. Imagine a colleague, Sally, who takes in a long term lodger, Larry. Although her home is extremely spacious (in fact it is infinite), Sally complains that Larry leaves things permanently lying around in random spots all over the house: his coat on the kitchen chair, glasses on the dining table, book on Sally's desk, slippers next to the sofa, coffee cup on the bathroom sink, etc. By the terms of his lodging agreement, Larry's things, once placed, cannot be moved. But Sally, who also needs places for her things and might – one never knows – take later take on new lodgers, is stuck forever noticing and trying not to disturb Larry's stuff in parts of the house that she uses often. You shake your head in sympathy, but quietly vow that to require any lodger you might take on would have to agree to placing place things only in a their guest room (a their shoulder). Under such an agreement, not only would Sally's home have been minimally disturbed by Larry's stuff, but also she would be able to take on any number of new lodgers (new assigning operations) under similar agreements.

So shoulders allow ARK assignment operations under a NAAN to be delegated to autonomous projects or divisions, just as NAANs do under the overall ARK namespace. Even if an organization initially only wants needs to use create ARKs for one project, plans may change. If other needs for ARKs arise later, setting aside a new shoulder for each new project or division makes it easy to ensure that autonomous independent assignment streams – present, past, or future – won't conflict with each other, thanks to non-overlapping namespaces. (Shoulders can also ease the namespace splitting problem.) If you would like to learn more about shoulders, please see the brief ARK Shoulders FAQ.

...