You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 51 Next »

What are ARKs?

ARKs (Archival Resource Keys) are high-functioning identifiers that lead you to things and to descriptions of those things. For example, this ARK,

     https://n2t.net/ark:/67531/metadc107835/

gets you to a dissertation, and adding a '?' on the end of the ARK should get you to its description:

     https://n2t.net/ark:/67531/metadc107835/?

What's an identifier?

On the internet, an identifier is a URL, or part of a URL. For example, this basic ARK identifier,

                            ark:/12148/btv1b8449691v/f29

appears inside two different URLs (Uniform Resource Locators, also known as web links or web addresses):

     https://gallica.bnf.fr/ark:/12148/btv1b8449691v/f29

            https://n2t.net/ark:/12148/btv1b8449691v/f29

ARKs are especially good at being persistent identifiers.

What's a persistent identifier?

The average lifetime of a URL was once said to be 44 days. At the end of its life, a URL link breaks, meaning it gives you the dreaded "404 Not Found" error that most of us have seen. Irritating as that may be, it's politically awkward when looking for publicly funded research, and it's a cultural disaster for libraries, archives, museums, and other memory organizations.

persistent identifier (sometimes abbreviated PID) is a link that in principle keeps working far into the future, even as things move between websites. Normally when things move, everyone who ever recorded the old links would need to be told what the new links are, which is next to impossible. That's where identifier resolvers come in.

What's a resolver?

resolver is a website that specializes in forwarding incoming identifiers (the ones originally advertised to users) to whichever websites are currently best able to deal with them. Overall, forwarding is called resolution; one step in a resolution process is called redirection

For a resolver to work, its hostname must be carefully chosen so won't ever need to be changed. Memory organizations, some of them centuries old, tend to have hostnames well-suited to be resolvers. Some well-known, younger resolvers are n2t.net (the ARK resolver), identifiers.org, doi.org, handle.net, and purl.org.

What are ARKs used for?

For anything and everything. Uses of ARKs include

  • digital content, such as genealogical records (FamilySearch)
  • publisher content (Portico)
  • digitized manuscripts (Gallica)
  • texts (Internet Archive)
  • museum holdings (Smithsonian)
  • vocabulary terms (yamz.net, perio.do)
  • historical figures (snaccooperative.org)
  • datasets, journals, living beings, and more.

Why would I use ARKs?

  • To keep costs down.
  • To work with exactly the metadata I want.
  • To be able to create identifiers without metadata.
  • To create an identifier as soon as I create the first draft of my data.
  • To keep that identifier private while the data evolves, and decide (maybe years) later, to publish or discard it.
  • To keep that identifier upon publication, and to assign an additional identifier, such as a DOI.
  • To integrate with the Data Citation Index ℠ and ORCID.org researcher profiles.
  • To link identifiers to different kinds of nuanced persistence commitments.
  • To use open infrastructure consistent with my organization's values.
  • To create one identifier that enables millions (suffix passthrough).

What does ARK have in common with DOI, Handle, PURL, and URN?

These are all major so-called persistent identifier schemes (or identifier types). They have much in common, starting with structure.

 https://n2t.net/ark:/99999/12345

   https://doi.org/10.99999/12345

https://handle.net/10.99999/12345

           https://purl.org/12345

https://<various>/urn:99999:12345

As seen in these examples, they all have three parts:

  1. the protocol (https://) plus a hostname,
  2. just for ARK and URN, there's also a label ("ark:" or "urn:"),
  3. the name assigning authority (99999, 10.99999, or purl.org), which is the organization that created a particular identifier,
  4. and finally, the name, or local identifier, that it assigned (12345). 

And they all have little effect on persistence.

Do you mean that ARK, DOI, Handle, PURL, and URN are useless?

That's too strong a statement, however, it's good to keep these identifier schemes (types) in perspective.

  • They all fail to stop the major causes of broken links: loss of funding, natural disaster, war, deliberate removal, human error, and provider neglect.
  • They all burden the end provider with the responsibility to update forwarding tables as URLs change.
  • They all give access to any kind of thing, whether digital, physical, abstract, person, group, etc.
  • They all identify content that is subject to change on future visits.
  • They all break regularly and in large numbers (many thousands and more).
  • A non-trivial fraction of each scheme's identifiers will fail permanently, requiring forwarding to "tombstone" pages.
  • They all use ordinary redirection built in to web servers since 1994 and provided for free by hundreds of URL shortening services.

Given how little each type gives you, it is wise to consider factors such as cost, risk, and openness when choosing one.

How do ARKs differ from identifiers like DOIs, Handles, PURLs, and URNs?

The short answer is that ARKs are the only mainstream, non-siloed, non-paywalled identifiers that you can register to use in about 24 hours. DOIs, Handles, and PURLs require resolution and other services to come from their respective centralized systems (silos). 

That's not to say that persistence is free. Making any identifier persistent burdens you, the provider, with the costs of content management, hosting, monitoring, and forwarding. You can do those things yourself or with help from a vendor. But with ARKs, just as with URLs, you will not be charged separately for your identifiers and you will not be locked in to a special-purpose resolution silo that also locks out other identifiers.

ARKs are very unusual in being decentralized. While one can get resolution services from a global ARK resolver called n2t.net, over 90% of the ARKs in the world do not use it.

More than 500 registered organizations across the world have created an estimated 3.2 billion ARKs, and, as with URLs, no one has ever paid for the right to create them.

How else do the identifier types differ?

Here are some more differences between DOIs, Handles, PURLs, and URNs.

  • All things eventually pass, including hostnames and the web itself and the "https://" protocol; when that first part of the identifier ceases to have meaning, only ARKs and URNs will include the label indicating the type of identifier that remains.
  • For DOIs, Handles, and PURLs, you are required to use their respective resolvers. ARKs and URNs, permit you to use your own resolver.
  • To create DOIs and Handles, you are required to pay a membership fee and, for DOIs, per-DOI charges. There are no fees for ARKs, PURLs, and URNs.
  • Although you can use your own or a vendor resolver for your ARKs and URNs, all ARKs can be resolved via n2t.net, making it the closest thing to a "global ARK resolver".
  • The envisioned URN resolver was never built, so URNs are currently resolved as URLs, and there is no designated global URN-as-URL resolver. In order to register to create URNs, you must apply for a URN namespace.
  • Unlike DOIs and Handles, (a) ARKs don't have metadata requirements and (b) ARKs that haven't been released into the world can be deleted.

When should I use ARKs compared to DOIs, Handles, PURLs, and URNs?

There are no simple answers. Identifiers (not things, but their names) are tricky to talk about, so if you hear simple answers elsewhere, beware of common fallacies.

Nothing inherent in ARKs, DOIs, Handles, PURLs, or URNs makes them more or less fit for any particular field, domain, or sector. With an identifier resolver and administrative management service, they all provide the core service of resolution (and so do properly managed URLs). 

The concrete differences that we experience, such as metadata (object descriptions), landing pages, and tool integration (eg, publishing tools), are not properties of identifier schemes per se, but properties of resolution, management, and citation services that various providers extend to or withhold from different identifier types. Those services are shaped in turn by communities of practice and by markets. Basic services are founded on a reliable database storing each identifier along with metadata elements (creator, title, date, redirection URL, etc) that describe the identified object. Extra services include link checking, duplicate detection, report generation, and searching.

Typically, scheme-based services are designed as silos ("walled gardens") to serve a particular identifier type (eg, Handle, DOI, or PURL). Each silo performs the same main functions – mapping names (identifiers strings) to things (objects or metadata). Excluding all but one type of identifier string may help to capture markets, but it's wasteful and non-inclusive. It requires building the same set of services over and over for each type and violates basic principles of openness, so the N2T (Name-to-Thing) resolver and EZID (identifiers made easy) management interface were designed to work with all identifiers. Work put into any new feature can be efficiently leveraged across all types, which sometimes creates surprising flexibility; for example, ARKs are often stored in EZID with "DOI metadata", and every DOI stored in N2T can benefit from "ARK resolution features" such as inflections and suffix passthrough, which are not available via the main DOI resolver (doi.org).

Generalizations about identifier types sometimes apply when resolution and management for that type is locked into one particular vendor or provider. For example, many PURL and Handle features and restrictions are well-defined by their respective administration silos. DOIs, which are built on top of Handles, have the same resolution features and restrictions as Handles, but metadata practices are diverse and evolving across registration agencies. DOIs used to be known primarily as identifiers for scientific and scholarly publications, with a mature community and service offering around "Crossref DOIs", but newer kinds of DOIs, such as those from DataCite and EIDR, are changing the nature of the DOI.

Don't identifier types differ in metadata flexibility, content negotiation, inflections, and suffix passthrough?

Only one resolver, n2t.net, supports all of these features, and it does so for any identifier stored with appropriate metadata. Contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support these features. For example, suffix passthrough is a feature supported by n2t.net (and purl.org has something similar called "partial redirect"), but not by doi.org or handle.net.

By metadata flexibility is meant the ability to store any metadata you want, including repeated elements, such as multiple authors and forwarding URLs, or no metadata at all. N2T has full metadata flexibility, while Crossref and DataCite have specific requirements (eg, the DataCite schema) to create their DOIs.

Content negotiation to request descriptions of things, but human beings can't do it themselves, and it only works for things that are not already in formats that might contain descriptions. Fortunately, without restriction, both humans and software can use inflections, exemplified by the '?' at the top of this FAQ. N2T is one of the few resolvers that that does both.

Why doesn't the global ARK resolver (n2t.net) have the word "ARK" in it?

Although N2T (Name-to-Thing) is a resolver originally built for ARKs, principles of openness prevented it from becoming just another DOI/Handle/PURL-type silo, which all perform the same main functions. Thus the "global ARK resolver" also resolves DOIs, Handles, PURLs, URNs, and 600 other types of identifier.

This counter-silo principle can also be found in micro-service tools such as noid, which was built for ARKs and is widely used by organizations that mint ARKs and those that mint Handles.

I've heard of ORCIDs and UUIDs – where do they fit in?

Those are special kinds of persistent identifiers. ORCIDs only identify researchers, and they link to research works using ARKs, DOIs, etc. ORCIDs look like

     https://orcid.org/0000-0001-7604-8041

UUIDs are globally unique, 37-character strings that are easy for software to generate but only become usable as web addresses when made part of a URL, for example, in this ARK:

           https://n2t.net/ark:/65665/3c2e39526-e0c3-41ae-be4f-07558a9458eb


  • No labels