Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Basics

What are ARKs?

ARKs (Archival Resource Keys) are high-functioning identifiers that lead you to things and to descriptions of those things. For example, this ARK,

...

  • digital content, such as genealogical records (FamilySearch)
  • publisher content (Portico)
  • digitized manuscripts (Gallica)
  • texts (Internet Archive)
  • museum holdings (Smithsonian)
  • vocabulary terms (yamz.net, perio.do)
  • historical figures (snaccooperative.org)
  • datasets, journals, living beings, and more.

Getting started

How do I start assigning ARKs?

The only prerequisite is to fill out an online request for a NAAN on behalf of your organization. There is no charge to obtain a NAAN and all memory organizations are welcome. Within a day or two you should receive an email containing a NAAN for your organization's exclusive use. Meanwhile consider the following.

  • What things do you want to name with ARKs? Generally you name objects that you own, control, or manage.
  • Will you assign ARKs to things contained in larger things that have ARKs? This (granularity) is not a problem, and the '/' character may help.
  • Where do you want your ARKs to resolve to? Examples: formatted file, surrogate for a physical thing, landing page with choices, etc.
  • Which web server will host your objects? You are asked this when you request a NAAN, even if it's not yet working.
  • Which web server/resolver will you use as hostname in the ARK-based URLs that you advertise/publish?

How do I start serving my ARKs?

It's like serving ordinary URLs, except that you have to convert the incoming ARK strings into a form that your server can handle. In this case serving content and local resolution are the same thing. All you need is a web server.

If starting from scratch, you have choices. On the one handTo convert ordinary web server processes into ARK-aware processes, all you  difference between providing access to ARK-identified objects vs URL-identified is like with providing access for ordinary URL-identified objects. For example, you could run all your own custom infrastructure – including content management, web hosting, minting (generating unique identifier strings), and running your own server/resolver. That infrastructure could be very simple, such as server configured to convert map incoming ARK-based URLs to server file pathnames. When you request your NAAN you will be asked to supply the base URL of your local server or resolver.

At the other end of the spectrum, you could work with a vendor that supplies all the infrastructure so that, for example, you could focus on creating content. Hybrid solutions are also common, such as just taking your current web server arrangement and just adding an identifier management piece (eg, the API/UI provided by ezid.cdlib.org, which partners with n2t.net).

If you run a server/resolver, you will also want to think about whether to advertise (release, publish, disseminate) your ARKs based at your resolver or at n2t.net. You might choose the former for branding or the latter for stability. Resolving your ARKs through n2t.net is always possible as a cost-free , regardless of how you advertise them (this is a side-effect of obtaining a NAAN).

What is a NAAN, and can I make changes to it?

...

You may request a NAAN by filling out an an online form. The NAAN you obtain will be listed alongside all other NAANs in the public NAAN registry. Use that same form to update your NAAN registry entry, for example, to if you make a change to the URL of your resolver.

ARKs and other identifiers

Why would I use ARKs compared to, for example, DOIs?

  • To keep costs down.
  • To work with exactly the metadata you want.
  • To be able to create identifiers without metadata.
  • To create an identifier as soon as you create the first draft of your data.
  • To keep that identifier private while the data and metadata evolve, and decide (maybe years) later, to publish or discard it.
  • To retain that identifier upon publication, perhaps then assigning an additional identifier, such as a DOI.
  • To be compatible with the Data Citation Index ℠ and ORCID.org profiles (ARKs appear in both places).
  • To link identifiers to different kinds of nuanced persistence commitments.
  • To be able to add queries (eg, ?lang=en) when resolving your identifiers.
  • To use open infrastructure consistent with your organization's values.
  • To link directly to the objects you value instead of to landing pages.
  • To create one identifier that enables millions (suffix passthrough).
  • To access convenient, full-function metadata via DRAFT ARK Identifiers FAQ.

What does ARK have in common with DOI, Handle, PURL, and URN?

These are the major persistent identifier types (or schemes). They have all been around since 2001 and they have much in common, starting with structure.

 https://n2t.net/ark:/99999/12345

   https://doi.org/10.99999/12345

https://handle.net/10.99999/12345

     https://purl.org/99999/12345

https://<various>/urn:99999:12345

As seen in these examples, they all have three parts:

  1. the protocol (https://) plus a hostname,
  2. just for ARK and URN, there's also a label ("ark:" or "urn:"),
  3. the name assigning authority (99999, 10.99999, or purl.org/99999), which is the organization that created a particular identifier,
  4. and finally, the name, or local identifier, that it assigned (12345). 

And they all have little effect on persistence. See 10 persistent myths about persistent identifiers.

Wait, are you saying ARK, DOI, Handle, PURL, and URN are useless?

No, that's too strong a statement. But let's keep these identifier schemes (types) in perspective.

  • They all fail to stop the major causes of broken links: loss of funding, natural disaster, war, deliberate removal, human error, and provider neglect.
  • They all require you, the end provider, to update forwarding tables as URLs change.
  • They all identify content that is subject to change or removal on future visits.
  • They all have identifiers that break regularly and in large numbers – many thousands and more.
  • They all give access to almost any kind of thing, whether digital, physical, abstract, person, group, etc.
  • A non-trivial fraction of each scheme's identifiers did, and will, fail permanently, requiring forwarding to "tombstone" pages.
  • They all rely on ordinary redirection built in to web servers since 1994 and provided for free by hundreds of URL shortening services.

Given how little the schemes do for you, when choosing one you'll likely want to consider factors such as cost, risk, and openness.

How do ARKs differ from identifiers like DOIs, Handles, PURLs, and URNs?

The short answer is that ARKs are the only mainstream, non-siloed, non-paywalled identifiers that you can register to use in about 48 hours. DOIs, Handles, and PURLs require resolution and other services to come from their respective centralized systems (silos). 

That's not to say that persistence is free. Making any identifier persistent burdens you, the provider, with the costs of content management, hosting, monitoring, and forwarding. You can do those things yourself or with help from a vendor. But with ARKs, just as with URLs, you will not be charged separately for your identifiers and you will not be locked in to a special-purpose resolution silo that also locks out other identifiers.

, or if you have negotiated with another organization to carry on your work and take over your NAAN. If you transition into or out of a vendor relationship, there is no problem taking your NAAN with you.

NAANs subdivide the set of all possible ARKs (the ARK namespace). The subset of ARKs under a given NAAN can be further subdivided into shoulders (eg, 12345/x2, 98765/b4), which can make it easy to delegate autonomous ARK assignment to departments in a large organization. ARK resolution is loosely based on NAANs, but because organizations split, ARKs accommodate the namespace splitting problem by supporting management of a namespace by more than one organization.

ARKs and other identifiers

Why would I use ARKs compared to, for example, DOIs?

  • To keep costs down.
  • To work with exactly the metadata you want.
  • To be able to create identifiers without metadata.
  • To have an identifier as soon as you create the first draft of your data.
  • To hold that identifier private while the data and metadata evolve, and decide (maybe years) later, to publish or discard it.
  • To retain that identifier upon publication, perhaps then assigning an additional identifier, such as a DOI.
  • To use identifiers designed from the outset for generic application, rather than, say, shoehorning a DOI into identifying a field station.
  • To be able to change identifier vendors and infrastructure without having to coordinate a database move with a central authority.
  • To be able to deal with the namespace splitting problem without losing control of your identifiers.
  • To link identifiers to different kinds of nuanced persistence commitments.
  • To be able to add queries (eg, ?lang=en) when resolving your identifiers.
  • To use open infrastructure consistent with your organization's values.
  • To link directly to the objects you value instead of to landing pages.
  • To create one identifier that enables millions (suffix passthrough).
  • To access convenient, full-function metadata via DRAFT ARK Identifiers FAQ.

What does ARK have in common with DOI, Handle, PURL, and URN?

These are the major persistent identifier types (or schemes). They have all been around since 2001 and they have much in common, starting with structure.

 https://n2t.net/ark:/99999/12345

   https://doi.org/10.99999/12345

https://handle.net/10.99999/12345

     https://purl.org/99999/12345

https://<various>/urn:99999:12345

ARKs, DOIs, and Handles are found in places like the Data Citation Index ℠ and ORCID.org profiles. As seen in these examples, they all have three parts:

  1. the protocol (https://) plus a hostname,
  2. just for ARK and URN, there's also a label ("ark:" or "urn:"),
  3. the name assigning authority (99999, 10.99999, or purl.org/99999), which is the organization that created a particular identifier,
  4. and finally, the name, or local identifier, that it assigned (12345). 

And they all have little effect on persistence. See 10 persistent myths about persistent identifiers.

Wait, are you saying ARK, DOI, Handle, PURL, and URN are useless?

No, that's too strong a statement. But let's keep these identifier schemes (types) in perspective.

  • They all fail to stop the major causes of broken links: loss of funding, natural disaster, war, deliberate removal, human error, and provider neglect.
  • They all require you, the end provider, to update forwarding tables as URLs change.
  • They all identify content that is subject to change or removal on future visits.
  • They all have identifiers that break regularly and in large numbers – many thousands and more.
  • They all give access to almost any kind of thing, whether digital, physical, abstract, person, group, etc.
  • A non-trivial fraction of each scheme's identifiers did, and will, fail permanently, requiring forwarding to "tombstone" pages.
  • They all rely on ordinary redirection built in to web servers since 1994 and provided for free by hundreds of URL shortening services.

Given how little the schemes do for you, when choosing one you'll likely want to consider factors such as cost, risk, and openness.

How do ARKs differ from identifiers like DOIs, Handles, PURLs, and URNs?

The short answer is that ARKs are the only mainstream, non-siloed, non-paywalled identifiers that you can register to use in about 48 hours. DOIs, Handles, and PURLs require resolution and other services to come from their respective centralized systems (silos). 

That's not to say that persistence is free. Making any identifier persistent burdens you, the provider, with the costs of content management, hosting, monitoring, and forwarding. You can do those things yourself or with help from a vendor. But with ARKs, just as with URLs, you will not be charged separately for your identifiers and you will not be locked in to a special-purpose resolution silo that also locks out other identifiers.

ARKs are unusual in being decentralized. While one can get resolution services from a global ARK resolver called n2t.net, over 90% of the ARKs in the world are published without reference to it. More than 500 registered organizations across the world have created an estimated 3.2 billion ARKs, and, as with URLs, no one has ever paid an identifier fee to create them. Of course maintaining them isn't free. It is never without cost to keep content access persistent in the long ARKs are unusual in being decentralized. While one can get resolution services from a global ARK resolver called n2t.net, over 90% of the ARKs in the world are published without reference to it. More than 500 registered organizations across the world have created an estimated 3.2 billion ARKs, and, as with URLs, no one has ever paid an identifier fee to create them. Of course maintaining them isn't free. It is never without cost to keep content access persistent in the long term, regardless of identifier type.

...

           https://n2t.net/ark:/65665/3c2e39526-e0c3-41ae-be4f-07558a9458eb

As an ARK, for example, that UUID should return metadata (if available) and be insensitive to the hyphens, making this form equally viable:

     https://n2t.net/ark:/65665/3c2e39526e0c341aebe4f07558a9458eb

From cradle to grave

...

As an ARK, for example, that UUID should return metadata (if available) and be insensitive to the hyphens, making this form equally viable:

     https://n2t.net/ark:/65665/3c2e39526e0c341aebe4f07558a9458eb

From cradle to grave

When in my workflow should I create ARKs?

At object birth, or even before. We name our babies before they're born, and we name and refer to objects in the conception stages, sometimes long before they bear fruit. Depending on how elaborate the planning may be, your unborn objects could have full-function ARKs that resolve to an appropriate surrogate and return rich metadata, including persistence statements.

The only caveat is to be careful releasing (advertising) ARKs that have uncertain long-term prospects. Some identifier management systems have features to help manage and resolve unreleased identifiers (eg, EZID has a "reserved" status). The more people who know about an ARK, the harder it is to delete.

How is it that ARKs can be easy to delete?

If no one knows about an identifier but you, there's no harm in deleting or withdrawing it. Stepping back, an identifier is actually an assertion that a given string of characters is associated with specific thing. The fewer people you tell, the easier it is to scrap that assertion. If you create a URL and share it only with your closest colleagues, that is much easier to withdraw than if the URL appeared for a month on a public website, from which it was harvested by internet search engines. In contrast, it is hard to delete DOIs and Handles because once registered and made resolvable, they are effectively released to the world.

...

Metadata also eases some persistence pain. By themselves, persistent identifier strings are often opaque, revealing little about what they identify (because non-opaque identifiers do not age or travel well). But opaque identifiers are difficult because they give you no clues as to what the identifiers were meant to identify. In the absence of metadata you are forced to access the object itself to remind yourself what it is, and to trust that it's the correct object. Metadata really helps. Moreover, discrepancies between returned metadata and the accessed object help everyone detect identifier changes and errors. 

...

Metadata is messy business for all identifiers, not just ARKs. Across domains and object types there are thousands of standards, many of them overlapping yet conflicting, and each is applied according to local organizational customs and with varying levels of compliance. Choosing or creating a specification for your metadata depends on factors such as

  • what metadata, if any, whether you are currently managing metadata (hint: stay with it unless you have a good reason to switch),
  • whether you want to officially publish objects (hint: prepare to be able to supply author, title, date, publisher/archive, and object type),
  • the requirements and capabilities of your resolver (hint: your IT staff or vendor might have its own requirements), and
  • whether you want to store non-standard elements (hint: N2T allows this, but most standards and vendors don't).

Reliable cross-domain interoperation may remain out of reach, but Dublin Core, DataCite, Schema.org JSON-LD, and Dublin Kernel are common metadata specifications in to consider for use with ARKs.

What is Dublin Kernel metadata for (who, what, when, where)?

...

Although inflections are commonly associated with ARKs, they are not "owned" by ARKs. Contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support such features. So, for example, inflections and suffix passthrough are supported by n2t.net for all identifier types, but not by doi.org or handle.net for any identifier types.

Resolvers

If most ARKs run on their own resolvers, why is there also a global resolver for ARKs?

...