Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How can I give feedback on this document?

By inserting comments into this comment-friendly versionsending an email to the ARK mailing list, https://groups.google.com/forum/#!forum/arks-forum, or contacting us as described on the communications page.

What are ARKs?

ARKs (Archival Resource Keys) are high-functioning identifiers that lead you to things and to descriptions of those things. For example, this ARK,

...

Kinds of things that have ARKs include those listed below. Numbers are approximate, current as of September 20192020, and self-reported by the linked organizations.

Categories

Examples

  • genealogical records (3 8 billion FamilySearch)
  • publisher content (100 million Portico)
  • scientific records (22 million INIST)
  • scanned texts (20 million Internet Archive)
  • bibliographic records (15 million BnF main catalog)
  • museum specimens (11 million going on 100 million Smithsonian)
  • public health documents, many from legal discovery (14 15 million UCSF IDL)
  • digitized documents and objects (5 million BnF Gallica)
  • historical authors and scholars persons, families, and organizations (4 million SNACSNACC)
  • finding aids and special collections (4 million Merritt)
  • resource maps (1.5 million RMap Hub)
  • educational resources (1.1 million University of Utah)
  • vocabulary terms (9,000 Periodo, YAMZ)
  • datasets, journals, archeological artifacts, living beings, and anything else you can think of!

Numbers of ARK-assigning organizations since 2001.Image Added

Who is using ARKs?

That's a little hard to say because ARKs are very decentralized, but more than 600 650 registered organizations have, between them, created an estimated 38.2 billion ARKs. You can find ARKs used as permalinks in

  • the Data Citation Index (linked to the Web of Science),
  • Wikipedia articles,
  • Wikidata records,
  • Internet Archive collections,
  • ORCID researcher profiles, etc.

Here's Below is the global distribution of organizations registered to create ARKs as as of October 2019September 2020. Clicking on the static image below should take you to an up-to-date, zoomable map.

Image RemovedImage Added

Getting started

...

is 12148, and it uniquely identifies the French National Library. Each NAAN is associated with the URL of a resolver for its ARKs, for example, to resolve 12148 ARKs, append them to http://ark.bnf.fr/ as shown in above. The N2T.net resolver is unusual in that it routes any ARK to the resolver registered under its NAAN.

There is no charge to obtain or use a NAAN, and you can request one by filling out an online form. Over 600 650 organizations have NAANs – libraries, archives, museums, university departments, government agencies, scholarly and educational publishers, projects, etc. – all listed in the public NAAN registry.

...

You are free to create ARK strings as you wish, provided you use only digits, letters (ASCII, no diacritics), and the following characters:

= ~ * + @ _ $ . /

The last two characters are reserved in the event you wish to disclose ARK relationships.

Another unique feature of ARKs is that hyphens ('-') may appear but are identity inert, meaning that strings that differ only by hyphens are considered identical; for example, these strings

ark:/12345/141e86dc-d396-4e59-bbc2-4c3bf5326152

ark:/12345/141e86dcd3964e59bbc24c3bf5326152

identify the same thing. The reason for this feature is that text formatting processes out in the world routinely introduce extra hyphens into identifiers, breaking and that breaks links to any server that treats hyphens as significant. ARKs are the only identifiers we know of that won't break when that happens.

Anchor
betanumeric
betanumeric
What is the recommended form for ARK strings?

ARKs distinguish between lower- and upper-ARKs distinguish between lower- and upper-case letters, which makes shorter identifiers possible (52 vs 26 letters per character position). The "ARK way", however, is to use lower-case only unless you need shorter ARKs. The restriction makes it easier for resolvers to support your ARKs in case they arrive from the world with mixed- or upper-case letters, which happens regrettably often due to the lingering 50-year-old assumption 1960s-era view that identifiers are case-insensitive . You might also consider using the character repertoire of the Noid tool, which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm; it uses only digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

...

(one sign of which is the prominence of the Caps Lock key on most computer keyboards).

Alphanumeric characters (letters and digits) are generally adequate, but it is recommended to use the betanumeric subset, consisting only of digits and consonants minus 'l' (letter ell, often mistaken for the digit 1):

0123456789bcdfghjkmnpqrstvwxz

This happens to be the repertoire produced from minters (unique string generators) supported by the Noid tool and N2T.net (used by ezid.cdlib.org and the Internet Archive), which creates transcription-safe strings using the strongest mainstream identifier check digit algorithm. When generating unique strings automatically, the absence of vowels helps avoid accidentally creating words that users can misconstrue.

Regarding assignment, one common strategy is to leverage legacy identifiers. For example, a museum moth specimen number cd456f9_87 might be advertised under the ark:/12345/cd456f9_87. Some legacy identifiers may need to be altered in view of ARK character restrictions. The second common strategy is to make up entirely new strings for your ARKs. In this case it is important to consider whether to make them opaque or non-opaque (or a bit of both). 

Anchor
opaque
opaque
What are opaque identifiers?

Persistent identifier strings are typically opaque, deliberately revealing little about what they're assigned to, because non-opaque identifiers do not age or travel well. Organization names are notoriously transient, which is why NAANs are opaque numbers. As titles and dates are corrected, word meanings evolve (ege.g., innocent older acronyms may become offensive or infringing), strings meant to be persistent can become confusing or politically challenging. The generation and assignment of completely opaque strings comes with risk too, for example, numbers assigned sequentially reveal timing information and strings containing letters can unintentionally spell words (which is why vowels are missing from the recommended character repertoire). 

...

ARKs are not required to be opaque, but it is recommended that the base object name be made opaque, since it tends to name the main focus of persistence. If any qualifier strings follow that name, it is less important that they be opaque. To help choose your approach to opacity, you may wish to consider compatibility with legacy identifiers and ease of string generation and transcription (eg, brevity, check digits). New strings can be created (minted) with date/time, UUID, and number generators, as well as Noid (Nice Opaque Identifiers) minters. 

Opaque strings are "mute" and therefore can be challenging to use and manage, which is why ARKs were designed to be "talking" identifiers. This means that if there's metadata, an ARK that comes in to your server with the '?' inflection should be able to talk about itself.

Anchor
servingARKs
servingARKs
How do I make server content addressable with ARKs?

First, decide what the user experience of accessing your ARKs will be, for example, a spreadsheet file, a PDF, an image, a landing page filled with formatted metadata and a range of choices, etc. Whichever you choose, plan for your server to be able to respond with metadata if your ARK should arrive with a '?' inflection after it.

Otherwise, serving ARKs is like serving URLs. Normally incoming URL strings somehow address (get mapped to) content that your web server returns. If your server is ARK-aware, incoming ARKs (expressed as URLs) must be mapped to the same content. A common approach is to map the ARK to the URL using a software table that you update whenever the URL changes. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.The term "map" here refers to a generic web server software process that associates the incoming URL with content such as a particular file or a database entry. The process varies greatly across servers, but can be thought of abstractly as a lookup in a two-column table: column 1 for each incoming URL and column 2 for the corresponding file, database entry, or another URL.

Unfortunately, this mapping table description is abstract because the details depend on your web server software. On the other hand, the idea of mapping is very basic to how the web has worked since the 1990's, so doing your own resolution is quite feasible. For example, most server configuration files can easily accommodate 100,000 mapping table rows with lines that look like "Redirect <incoming ARK> <URL on this or other server>" (columns 1 and 2, after you replace what is in <>'s). A common approach with ARKs is to map each incoming ARK (column 1) to the kind of URL that your web server already knows how to deal with, and you are done. With this approach, to keep the ARKs in column 1 stable you only need to keep the URLs in column 2 updated when they change. In this case your server is acting as a local resolver. If you don't want to implement this yourself, there are ARK software tools and services that can help.

Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (Another approach is to run your web server without change, but instead of updating local tables, you would update ARK-to-URL mapping tables residing at a non-local resolver. Examples of this can be found among vendors and in any organization that updates tables via EZID.cdlib.org (which, due to a special relationship, updates resolver tables at n2t.net).

How do I cite or advertise an ARK?

The URL (https or http) form of the ARK is preferred, for example,

https://n2t.net/ark:/99166/w66d60p2

An ARK meant for external use is generally advertised (released, published, disseminated) in this way in order to be an actionable identifier. If a more compact visual display of an ARK is needed, it should be hyperlinked; for example, a compact display of an HTML hyperlink can be achieved with

<a href="https://n2t.net/ark:/99166/w66d60p2"> ark:/99166/w66d60p2 </a>

An important decision is whether your URL-based ARKs will use the hostname of your local resolver or the N2T.net resolver. If local control or branding is important enough, you would advertise ARKs based at your local resolver (see about serving content with ARKs). If you're concerned about the stability of your local hostname, you would advertise your ARKs based at n2t.net (see examples of both).

Resolving your ARKs through N2T is always possible for users, regardless of how you advertise them.

...

There are also some vendors, such as ezid.cdlib.org, and some more information on concepts and best practices.

Beyond the basics

Is "ARK" intended to be a Christian metaphor?

No, the ARK identifier is not meant to be a Christian metaphor. "ARK" was chosen primarily as a pronounceable acronym for "Archival Resource Key".

Our logo and acronym may evoke the story of Noah's Ark, which is shared by the Abrahamic faiths of Islam, Judaism, and Christianity, and we would be happy for the ARK identifier to be associated with a trustworthy vessel to help preserve precious things.

Beyond the basics

Anchor
Anchor
n2t
n2t
What is N2T?

...

N2T.net is a global ARK resolver. N2T, which stands for Name-to-Thing, is actually a generalized resolver for mapping names into things, so it knows knows where to route over 600 900 other types of identifier – ARK, DOI, PMID, Taxon, PDB, ISSN, etc. If you're interested, the diagram and rest of this the next answer give a bit more detail.

A request comes in from the general public as a URL Resolution starts when someone attempts to access a URL (eg, by clicking on a link) consisting of "https://n2t.net/" followed by an the identifier (name) to be resolved. N2T looks up that identifier and redirects the original link to a forwarding link. To do this it uses two different resolution patterns. First N2T tries to resolve according to information found in an individual stored identifier. Failing that, N2T tries to resolve according to any stored class rules, based on the identifier type. There is also an N2T API requiring login credentials that allows batch operations and unique identifier generation.

N2T uses two kinds of stored data. First, it stores individual records for over 20 million object identifiers (eg, ARKs, DOIs) that it obtains from three sources: EZID.cdlib.org, Internet Archive, and YAMZ.net. When such records include a redirection URL (target) and descriptive 131533174, N2T can act on 131533174 as well as perform suffix passthrough and "content negotiation".

Second, N2T stores over 3500 "rule" records for routing identifiers not found individually in N2T, but for which it has redirection information tied to the type of identifier being resolved. It obtains rule records from several sources, including the NAAN registry, a database of ARK and DOI shoulders, and a formal partnership on compact identifiers with identifiers.org.

If most ARKs run on their own resolvers, why is there also a global resolver for ARKs?

Most ARKs are created by organizations that advertise ("publish") them based at their own resolvers. For example, this ARK was published based at the ark.bnf.fr resolver:

          http://ark.bnf.fr/ark:/12148/btv1b8449691v/f29

As with any resolver, no one knows in advance – not the user, not the web browser, nor N2T itself – if resolution will be successful. It depends on what N2T finds that it knows about the received identifier.

When resolution is finished, the user is often unaware that it happened, unless they are paying attention (eg, they notice that the link in the location bar is different from what they clicked on). Resolution is designed to take place without the user noticing.

How does N2T do its work?

Structure of the N2T resolverImage Added

When a resolution request comes in from the general public, N2T looks up the identifier and redirects the original link to a forwarding link. To do this it uses two different resolution "patterns". To begin, N2T tries to resolve according to information found in an individual stored identifier. Failing that, N2T tries to resolve according to any stored class rules, based on the identifier type. 

N2T has a different kind of stored data for each pattern. First, it stores individual records for about 50 million object identifiers (eg, ARKs, DOIs) that it obtains from three sources: EZID.cdlib.org, Internet Archive, and YAMZ.net. When such records include a redirection URL (target) and descriptive metadata, N2T can act on inflections as well as perform suffix passthrough and "content negotiation". To support creation and maintenance of individual identifier records, there is an N2T API requiring login credentials. The API also allows batch operations and unique identifier generation (minting).

Second, even if N2T knows nothing about an individual identifier, resolution may still work because of a stored routing rule record triggered by the type of the identifier. N2T maintains over 3500 rule records regularly updated from several sources, including the NAAN registry, a database of ARK and DOI shoulders, and a formal partnership on compact identifiers with identifiers.org.

If most ARKs run on their own resolvers, why is there also a global resolver for ARKs?

Most ARKs are created by organizations that advertise ("publish") them based at their own resolvers. For example, this ARK was published based at the ark.bnf.fr resolver:

          http://ark.bnf.fr/ark:/12148/btv1b8449691v/f29

Having to run and maintain your own Having to run and maintain your own resolver is the cost of complete autonomy. Using your own resolver also lets you do branding via the hostname, the downside being that brands are transient and tend to make identifiers fragile. Political and even legal (eg, trademarks) pressures may make supporting older branded hostnames, hence their identifiers, difficult.

...

Second, while some organizations and their resolver hostnames are long-lived, most are not. A person trying to use an ARK containing a non-working resolver hostname can replace the non-working part with "n2t.net". If circumstances ever force you to change your resolver, this replacement step gives ARKs that you published prior to the change a better chance of working. In some cases, an organization itself may be long-lived, but for legal or political reasons it may be required to change its hostname.

To avoid future inconvenience, some organizations that run their own resolvers may choose from To avoid future inconvenience, some organizations that run their own resolvers may choose from the outset to suppress their resolver names and just advertise ("publish") their ARKs based at n2t.net.

...

When demand for a global ARK resolver arose, basic principles of openness and generality prevented the designers from creating yet another silo in the DOI/Handle/PURL mold. Instead,  the the ARK resolver was built to be a generic, scheme-agnostic resolver called N2T (Name-to-Thing), which now resolves over 600 over 900 types of identifier, including ARKs, DOIs, Handles, PURLs, URNs, ORCIDs, ISSNs, etc. Resolution is essentially looking in a table for an identifier string, regardless of type, and redirecting it to the right place.

...

What are the parts of an ARK?


 ARK ANATOMY                  
          Core Immutable Identity
              
      Resolver Service   Base Object Name    Qualifiers
     __________________  _________________  _____________
    /                    \/         ...        \/             \
       Resolver Service   Base Object Name    Qualifiers
     __________________  _________________  _____________
    /                  \/                 \/             \
    https://example.org/https://example.org/ark:/12345/654xz321x54xz321/s3/f8.05v.tiff
            \_________/ \__/ \___/ \______/\____/\_______/
                 |       |     |  ...    |     |       |
                 |     Label   |   |   |   Sub-parts  Variants
                 |             |   |   |
 Name Mapping Authority (NMA)  |   |  Assigned Name      ...
                               |             |      |
 Name Mapping Authority (NMA)  |   Assigned Name
                               |
                Name Assigning Authority Number (NAAN)

...

   +---------- Shoulder: /x5
                Name Assigning Authority Number (NAAN)

Anchor
granularity
granularity
Can I assign ARKs to things inside something that already has an ARK?

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

ARK namespaces and sub-namespaces

Anchor
namespaces
namespaces
What is the purpose of the NAAN
?

The main purpose is to prevent assignment conflicts. By obtaining a NAAN, an organization gets the exclusive right to create ARKs "under" that NAAN, which is part of a prefix in front of all your ARKs. The set of ARKs you can create is infinite and is known as your NAAN's namespace, and your NAAN namespace is a sub-namespace (subset) of the ARK namespace (the set of all possible ARKs). For example, the Internet Archive's NAAN namespace is all ARKs starting with "ark:/13960/". NAANs effectively subdivide the ARK namespace into non-overlapping sub-namespaces, each one holding an infinite number of possible ARKs. Since organizations only create ARKs in their own namespaces, ARK assignments between organizations will never "collide".

NAANs also play a key role in resolution. For example, if the N2T.net resolver cannot find an incoming ARK in its database, it looks at the incoming NAAN and redirects the ARK to the local resolver registered with the NAAN. Any local resolver could be configured to return the favor for incoming ARKs containing NAANs that it doesn't know about, simply by redirecting them to N2T.

All NAANs must be registered with N2T and listed in the public NAAN registry, which also lists the official resolver for each NAAN.  

How do ARK namespaces work?

They work much the same way that all namespaces work. Given a prefix associated with a namespace, this prefix can be "extended" (adding characters to the end of it) to create a new sub-namespace (directly under it) associated with the extended prefix. If the extended prefixes don't conflict, nor will the names in the associated namespaces. There can be a namespace associated with any prefix you can think of, each with a potentially infinite number of names (ARKs) that start with it.

Set of all ARKs startingAssociated namespaceExample ARK in that namespace
ark:/All ARKsark:/99999/fk4gt2m
ark:/12345/ARKs under the NAAN 12345ark:/12345/p987654
ark:/12345/x5 ARKs under the 12345/x5 shoulderark:/12345/x5wf6789
ark:/12345/x5wf6789/ARKs under the 12345/x5wf6789 objectark:/12345/x5wf6789/c2/s4.pdf

The above table shows examples of four common namespace/sub-namespace levels. The first is for all ARKs and the second is for all ARKs under ark:12345. The third is the shoulder concept, described below, which is the next subdivision under the NAAN; note that it has no "/" after it.

The fourth, a complete ARK-as-prefix example, shows that an object ARK is itself also a namespace, with an infinite number of "sub-ARKs" that could descend from it to name object parts and variants. Creating new namespaces to avoid naming conflicts is an ancient practice. For example, a family may refer to someone as Sam, the community as Sam Smith, the government as Sam Smith, 4321 Main Street, Springfield, and history as Sam Smith, 4321 Main Street, Springfield, 1888-1997.

Anchor
shoulders
shoulders
What is a shoulder?

Image Added

shoulder is a sub-namespace under a NAAN. It is the set all ARKs starting with a short, fixed extension to the NAAN. For example, in

ark:/12345/x5wf6789/c2/s4.pdf

the shoulder, /x5, extends the NAAN, 12345. The short designation, /x5, isn't unique in many contexts, so the fully qualified, globally unique designation should be used (for example, ark:/12345/x5). In the classic namespace tradition, the shoulder is the set of all possible ARKs starting with the shoulder name. Our use of this term is borrowed from locksmithing, which understands sets of keys to be defined by fixed, unvarying "shoulders" that precede the varying "blades" (shapes that differ among keys sharing the same shoulder) that follow it.

Shoulders help organize a NAAN namespace for the long term. Just because a namespace contains an infinite number of possible ARKs does not mean that finding an unassigned ARK is easy, especially when over time there are – or were, or may be – different independent ARK assignment operations under it. Just as the ARK community sets aside organizations' NAAN namespaces, each organization is encouraged to set aside shoulder sub-namespaces. If you don't use shoulders from the beginning, even for one simple stream of assignments, you risk creating mild but permanent chaos in your NAAN namespace, and you may end up requesting an additional NAAN (which is discouraged) for future assignment streams.

What is the purpose of a shoulder?

A shoulder is analogous to a guest room in your house. Imagine a colleague, Sally, who takes in a long term lodger, Larry. Although her home is extremely spacious (in fact it is infinite), Sally complains that Larry leaves things permanently lying around in random spots all over the house: his coat on the kitchen chair, glasses on the dining table, book on Sally's desk, slippers next to the sofa, coffee cup on the bathroom sink, etc. By the terms of his lodging agreement, Larry's things, once placed, cannot be moved. But Sally, who also needs places for her things and might later take on new lodgers, is stuck forever noticing and trying not to disturb Larry's stuff in parts of the house that she uses often.

Understanding Sally's troubles, you might vow to require any guest of yours to agree to place things only in their room (their shoulder). Under such an agreement, not only would Sally's home have been minimally disturbed by Larry's stuff, but also she would be able to take on any number of new lodgers (new assigning operations) under similar agreements.

So shoulders allow ARK assignment under a NAAN to be delegated to autonomous projects or divisions, just as NAANs do under the overall ARK namespace. Even if an organization initially only needs to create ARKs for one project, plans may change. If other needs for ARKs arise later, setting aside a new shoulder for each new project or division makes it easy to ensure that independent assignment streams – present, past, or future – won't conflict with each other, thanks to non-overlapping namespaces. (Shoulders can also ease the namespace splitting problem.) If you would like to learn more about shoulders, please see the brief ARK Shoulders FAQ.

Anchor
#sharedNAAN
#sharedNAAN
Might I ever want to create ARKs on a NAAN that is not owned by my organization?

Yes, because there are four shared NAANs with special semantics that you might want to take advantage of. Normally, long term ARKs and their NAANs should be opaque, revealing little about what they're assigned to, but the semantics in the table below are considered so immutable as to not risk their longevity. Each shared NAAN has particular connotations that software and people with enough training can recognize and benefit from, and this offers some relief from the challenge of using opaque identifiers. 

Shared NAANs are not owned by any one organization. In order to create ARKs without conflict under a shared NAAN requires, as you might imagine, reserving a shoulder, and that requires filling out an online form to request a shoulder under a shared NAAN (please don't use this for shoulders under your own, non-shared NAAN).

Shared NAAN
meaning

Purpose, meaning, or connotation of ARKs with this NAAN.

(It's ok for these NAANs to be non-opaque since their meanings are immutable.)

Expect to resolve?OK for long term reference?

12345 examples

Example ARKs appearing in documentation. They might resolve, but no link checker need be concerned if they don't. They should not be considered viable for long term reference.maybeno

99152 terms

ARKs for controlled vocabulary and ontology terms, such as metadata element names and pick-list values. They should resolve to term definitions and are suitable for long term reference.yesyes

99166 agents

ARKs for people, groups, and institutions as "agents" (actors, such as creators, contributors, publishers, performers, etc). They should resolve to agent definitions and are suitable for long term reference.yesyes

99999 test ids

ARKs for test, development, or experimental purposes, often at scale. They might resolve, but no link checker need be concerned if they don't. They should not be considered viable for long term reference.

maybeno

The 99999 and 12345 ARKs ("non-real") are especially useful if you are responsible for reviewing broken link reports. Unless you know otherwise, errors for ARKs with these NAANs can be ignored. This can save lots of wasted effort since, despite providers' best efforts, such non-real ARKs frequently "escape into the wild" for all to see. Recipients (eg, people and link checkers) that would normally be concerned with broken links have only to recognize these two special NAANs in order to avoid being distracted by them. (Note that the non-real semantics remain even if the things don't exist.)

Can I make changes to a NAAN?

You can request a change to the registry entry for a NAAN related to your organization

Yes, ARKs can be assigned at any level of granularity, such as to a manuscript, to chapters inside it, to chapter sections, subsections, etc. An ARK can also be assigned to a thing that encloses other things. In ARKs the character '/' is reserved to help the recipient understand about containment, for example, the first object below contains the second:

ark:/12148/btv1b8449691v

ark:/12148/btv1b8449691v/f29

That's the containment qualifier. There's only one other ARK qualifier, and it indicates variant forms of a thing by using the reserved character '.' in front of a suffix. For example, if these ARKs identify documents,

ark:/12148/btv1b8449691v/f29.pdf

ark:/12148/btv1b8449691v/f29.html

because they differ only by the suffix .pdf or .html, it can be inferred that they identify two different forms of the same document.

What is the purpose of the NAAN, and can I make changes to it?

NAANs subdivide the set of all possible ARKs (the ARK namespace). The subset of ARKs under a given NAAN can be further subdivided into shoulders (eg, 12345/x2, 98765/b4), which can make it easy to delegate autonomous ARK assignment to departments in a large organization. ARK resolution is loosely based on NAANs, but because organizations split, ARKs accommodate the namespace splitting problem by supporting management of a namespace by more than one organization. If you transition into or out of a vendor relationship, there is no impediment to taking your NAAN with you.

You can change a NAAN by filling out the same online form used for requesting a new NAAN. For security purposes requests are processed manually. Example reasons for a change may include

  • notifying N2T that  of a change in your organization's contact person or resolver URL will change,
  • updating your organization's name assignment policy (sample policy),
  • requesting an additional NAAN for , eg, to support a significant new body of ARKs or new organizational division, and
  • transitioning your NAAN to another organization that will carry on your work and take over your NAAN.

Are there restrictions on the use of NAANs?

Yes, it is important never to invent or use a NAAN that is not listed in the public registry. There are, however, two special NAANs that anyone can use:

  • 99999, for "test", "development", or experimental ARKs, and
  • 12345, for non-functional ARKs appearing in documentation.
  • , and
  • transitioning your NAAN to another organization that will carry on your work and future use of your NAAN.

NAANs are portable. If your organization transitions into or out of a vendor relationship, there is no impediment to taking your NAAN with youFor people with enough training, it is easy to recognize and eliminate ARKs with these NAANs from broken link reports that have to be dealt with. Despite providers' best efforts, such ARKs frequently "escape into the wild", where they end up confusing users and link checkers.

ARKs and other identifiers

Why would I use ARKs compared to, for example, DOIs?

  • To keep costs down (details).
  • To work with exactly the metadata you want.
  • To be able to create identifiers without metadata.
  • To be able to create an identifier even before your object exists.
  • To have an identifier as soon as you create the first draft of your data.
  • To hold that identifier private while the data and metadata evolve, and decide (maybe years) later, to publish or discard it.
  • To retain that identifier upon publication, perhaps then assigning an additional identifier, such as a DOI.
  • Because ARKs, built for generic application and not specifically for published content, fit naturally with physical objects like samples or field stations.
  • Because ARK resolvers can deal with identifiers routinely damaged out in the world by text formatting processes that introduce hyphens.
  • Because most ARKs carry a Noid check digit that can be used to detect all common transcription errors rather than just some of them.
  • To be able to create shorter identifiers, since mixed-case permits denser strings (a larger number of strings of a given length).
  • To be able to change vendor and/or infrastructure without having to coordinate database transfers with a central authority.
  • To be able to deal with the namespace splitting problem without losing control of your identifiers.
  • To link identifiers to different kinds of nuanced persistence commitments.
  • To be able to add queries (eg, ?lang=en) when resolving your identifiers.
  • To use open infrastructure consistent with your organization's values.
  • To link directly to the objects you value instead of to landing pages.
  • To create one identifier that enables millions (suffix passthrough).
  • To access convenient, full-function metadata via 131533174 inflections.
  • To integrate easily with IIIF APIs using ARK qualifiers.

...

ARKs are the only mainstream, non-siloed, non-paywalled identifiers that you can register to use in about 48 hours. DOIs, Handles, and PURLs require that resolution and other services to come from their respective centralized systems (silos). 

...

ARKs are unusual in being decentralized. While one can get resolution services from a global ARK resolver called n2t.net, over 90% of the ARKs in the world are published published without without using n2t.net in the URL hostname. More than 600 650 registered organizations across the world have, between them, created an estimated 38.2 billion ARKs, and, as with URLs, no one has ever paid an identifier fee to create them. Of course maintaining them isn't free. It is never without cost to keep content access persistent in the long term, regardless of identifier type.

...

  • Landing pages: Crossref and DataCite DOIs link to publisher landing pages constructed around but not directly to objects you care about, but ARKs can freely link directly to objects you care about, which is machine- and human-friendly since it does not require an extra human navigation step for common tasks such as
    • opening an article's PDF file for reading,
    • referencing an image file meant to be incorporated automatically inline into a document, and
    • citing a spreadsheet to be used for direct data analysis by software.
  • DOIs, Handles, etc. do not support ARK-style 131533174inflections.
  •  that permit access to metadata regardless of whether an identifier points to an object or its landing page.
  • Unlike DOIs and Handles, ARKs don't have metadata requirements. ARKs that haven't been released into the world are easy to delete.
  • All things eventually pass, including hostnames and the web itself and the "https://" protocol. When that first part of the identifier ceases to have meaning, only ARKs and URNs will include the label (eg, "ark:") indicating the type of identifier that remains.
  • For DOIs, Handles, and PURLs, you are required to use their respective resolvers. ARKs and URNs, permit you to use your own resolver.
  • To create DOIs and Handles, you are required to pay a membership fee and, for DOIs, there are per-DOI charges passed on in various ways by allocating agencies. There are no fees for ARKs, PURLs, and URNs.
  • To create Handles, you are required to install and maintain a local Handle server, which gives you another system to monitor, patch, and troubleshoot.
  • Although you can use a local or vendor resolver for your ARKs and URNs, ARKs can be resolved via the global n2t.net resolver.
  • The envisioned URN resolution infrastructure was never built, so URNs are currently resolved as URLs, and there is no designated global URN-as-URL resolver. In order to register to create URNs, you must apply for a URN namespace.
  • ARKs have some unique features that support early object development: ARKs can be deleted, can be born with no metadata, and can exist with any metadata you care to store. 

...

The object and its metadata develop together, and for the subset of things that you end up wanting to publish in places that require DOIs, you can assign DOIs at publication time. If your ARK is stable and has basic metadata, you're already doing everything you need to support a proper DOI. This is a way in which ARKs support early object development.

To support two identifiers efficiently, it is recommended that you create the DOI such that it redirects to the original ARK. This not only eliminates the need ever to update the DOI redirection, but it also keeps the ARK persistent for anyone who previously recorded or bookmarked it.

...

The only caveat is to be careful releasing (advertising) ARKs that have uncertain long - term prospects. Some identifier management systems have features to help manage and resolve unreleased identifiers (eg, EZID has a "reserved" status). The more people who know about an ARK, the harder it is to delete.

...

Creating metadata (extra information associated with or describing an object) has several key benefits. First, no matter what the ARK redirects to – whether a landing page or a file – metadata gives users vital information about the object, such as references to newer versions, creation date, provenance, etc. For ARKs typically metadata is accessed via 131533174inflections.

Metadata really eases the difficulty of working with opaque identifiers, which reveal no clues as to what they identify. In the absence of metadata you are forced to access the object itself to remind yourself what it is, and also to trust that you're accessing the correct object. Moreover, discrepancies between returned metadata and the accessed object help everyone detect identifier changes and errors. 

...

Content negotiation for metadata is a software technique for requesting alternate formats of an object, such as the PDF or RTF form of an HTML file. Although not designed for it, historic "content negotiation" was kludged (twisted) in certain contexts to request metadata under the startling assumption that formats often used to hold metadata are in fact metadata and will never be objects in their own right. Unlike inflections, "content negotiation for metadata"/doesn't work at all for objects represented in those formats (the list of which is growing and known only by private agreement), nor is it easy enough to be used directly by most human users.

...