Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking links to any server that treats hyphens as significant.

...

ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 1960's1960s-era view that identifiers are case-insensitive (one sign of which is the prominence of the Caps Lock key on most computer keyboards).

Alphanumeric characters (letters and digits) are generally adequate, but it is recommended to use the betanumeric subset, consisting only of digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

This happens to be the repertoire produced from minters (unique string generators) supported by the Noid tool and N2T.net (used by ezid.cdlib.org and the Internet Archive), which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm. When generating unique strings automatically, the absence of vowels helps avoid accidentally creating words that users can misconstrue.

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (eg, innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore challenging to manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's metadata, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings somehow addresses address (gets get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. The term "map" here refers to a generic web server software process that associates the incoming URL with, for example, a particular file or a database entry. The process varies greatly across servers, but can be thought of abstractly as a lookup in a two-column table: column 1 for each incoming URL and column 2 for the corresponding file, database entry, or another URL.

Unfortunately, this mapping table description is abstract because the details depend on your web server software. Fortunately, however, it is very basic to how the web has worked since the 1990's and so doing your own resolution quite feasible. For example, most server configuration files can easily accommodate 100,000 mapping table rows with lines that look like "Redirect <incoming ARK> <URL on this or other server>" (columns 1 and 2, after you replace what is in <>'s). A common approach with ARKs is to map each incoming ARK (column 1) to the kind of URL that your web server already knows how to deal with, and you are done. With this approach, to keep the ARKs in column 1 stable you only need to keep the URLs in column 2 updated when they change. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

N2T has a different kind of stored data for each pattern. First, it stores individual records for about 50 million object identifiers (eg, ARKs, DOIs) that it obtains from three sources: EZID.cdlib.org, Internet Archive, and YAMZ.net. When such records include a redirection URL (target) and descriptive metadata131533174, N2T can act on inflections131533174 as well as perform suffix passthrough and "content negotiation". To support creation and maintenance of individual identifier records, there is an N2T API requiring login credentials. The API also allows batch operations and unique identifier generation (minting).

...

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

...

Examples like those in the above table are quite common. The first is for all ARKs and the second is for all ARKs under ark:12345. The third is the shoulder 131533174 concept, described below, which is the next subdivision under the NAAN; note that it has no "/" after it. The fourth, a complete ARK-as-prefix example, shows that an object ARK is itself also a namespace, with potentially an infinite number of "sub-ARKs" that descend from it to name object parts and variants. The practice of creating new namespaces by adding information to an existing namespace is very common and very old, for example, in "snail mail" terms, you might be from Springfield or, more precisely, Flat #3B, 4321 Main Street, Springfield, Ontario, Canada.

...

In fact, in-house ARK administrators always know where the shoulder ends, provided it was chosen using the "first-digit convention". A primordinal shoulder is a sequence of one or more betanumeric 131533174 letters ending in a digit. This means that the shoulder is all letters (often just one) after the NAAN up to and including the first digit encountered after the NAAN. Another advantage of primordinal shoulders is that there is an infinite number of them possible under any NAAN.

...

Having said that, there are a couple of cases where shoulder implementation does involve a kind of "creation" step. A system such as ezid.cdlib.org supports both the purely "decision-based" shoulders above (that emerge from user practice, eg, Smithsonian) as well as the creation of system-recognized shoulders with accompanying minter services and registered API access points. In contrast, implementing a decision-based shoulder requires no explicit shoulder creation step, but does involve the creation of one or more ARKs that start with that shoulder. As a special case that works with the N2T.net resolver, it is possible to create a short ARK, such as ark:/99152/p0, that is an identifier even though it looks and acts like a shoulder due to suffix passthrough.

...

  • To keep costs down (details).
  • To work with exactly the metadata you want.
  • To be able to create identifiers without metadata.
  • To be able to create an identifier even before your object exists.
  • To have an identifier as soon as you create the first draft of your data.
  • To hold that identifier private while the data and metadata evolve, and decide (maybe years) later, to publish or discard it.
  • To retain that identifier upon publication, perhaps then assigning an additional identifier, such as a DOI.
  • Because ARKs, built for generic application and not specifically for published content, fit naturally with physical objects like samples or field stations.
  • Because ARK resolvers can deal with identifiers routinely damaged out in the world by text formatting processes that introduce hyphens.
  • Because most ARKs carry a Noid check digit that can be used to detect all common transcription errors rather than just some of them.
  • To be able to create shorter identifiers, since mixed-case permits denser strings (a larger number of strings of a given length).
  • To be able to change vendor and/or infrastructure without having to coordinate database transfers with a central authority.
  • To be able to deal with the namespace splitting problem without losing control of your identifiers.
  • To link identifiers to different kinds of nuanced persistence commitments.
  • To be able to add queries (eg, ?lang=en) when resolving your identifiers.
  • To use open infrastructure consistent with your organization's values.
  • To link directly to the objects you value instead of to landing pages.
  • To create one identifier that enables millions (suffix passthrough).
  • To access convenient, full-function metadata via inflections 131533174.
  • To integrate easily with IIIF APIs using ARK qualifiers.

...

  • Landing pages: Crossref and DataCite DOIs link to publisher landing pages constructed around but not directly to objects you care about, but ARKs can freely link directly to objects you care about, which is machine- and human-friendly since it does not require an extra human navigation step for common tasks such as
    • opening an article's PDF file for reading,
    • referencing an image file meant to be incorporated automatically inline into a document, and
    • citing a spreadsheet to be used for direct data analysis by software.
  • DOIs, Handles, etc. do not support ARK-style inflections131533174.
  •  that permit access to metadata regardless of whether an identifier points to an object or its landing page.
  • Unlike DOIs and Handles, ARKs don't have metadata requirements. ARKs that haven't been released into the world are easy to delete.
  • All things eventually pass, including hostnames and the web itself and the "https://" protocol. When that first part of the identifier ceases to have meaning, only ARKs and URNs will include the label (eg, "ark:") indicating the type of identifier that remains.
  • For DOIs, Handles, and PURLs, you are required to use their respective resolvers. ARKs and URNs, permit you to use your own resolver.
  • To create DOIs and Handles, you are required to pay a membership fee and, for DOIs, there are per-DOI charges passed on in various ways by allocating agencies. There are no fees for ARKs, PURLs, and URNs.
  • To create Handles, you are required to install and maintain a local Handle server, which gives you another system to monitor, patch, and troubleshoot.
  • Although you can use a local or vendor resolver for your ARKs and URNs, ARKs can be resolved via the global n2t.net resolver.
  • The envisioned URN resolution infrastructure was never built, so URNs are currently resolved as URLs, and there is no designated global URN-as-URL resolver. In order to register to create URNs, you must apply for a URN namespace.
  • ARKs have some unique features that support early object development: ARKs can be deleted, can be born with no metadata, and can exist with any metadata you care to store. 

...

Creating metadata (extra information associated with or describing an object) has several key benefits. First, no matter what the ARK redirects to – whether a landing page or a file – metadata gives users vital information about the object, such as references to newer versions, creation date, provenance, etc. For ARKs typically metadata is accessed via 

inflections131533174.

Metadata really eases the difficulty of working with opaque identifiers, which reveal no clues as to what they identify. In the absence of metadata you are forced to access the object itself to remind yourself what it is, and also to trust that you're accessing the correct object. Moreover, discrepancies between returned metadata and the accessed object help everyone detect identifier changes and errors. 

...