Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For the purpose of supporting early object development, some distinguishing features of ARKs are that they can be deleted, they can exist with no metadata, and they can exist with any metadata you care to store.

Why are

...

ARKs that haven't been "released into the world" easy to delete?

If no one knows about an identifier but you, there's absolutely no problem harm in removing deleting or withdrawing it. Identif

This is actually true for many types of identifier. An advertised identifier

What do you mean by ARKs supporting early object development?

We need identifiers long before we know exactly what they refer to, or even if they refer to anything useful. An identifier that requires mature metadata cannot be used during early object development since little is known about the object. So object creators almost always initially assign identifiers that have no metadata requirements, such as URLs or ARKs.

If you start with an ARK, you benefit from being able to keep the original identifier through to public release as the metadata matures. Many objects go through intensive development and revision phases, in which they are too immature to meet most metadata requirements. Nonetheless every object needs some sort of identifier from conception to maturity, where maturity could look like public release and further enhancement, or abandonment. Like the object itself, metadata elements need a flexible place to grow and mature over time:

  • starting from the first plans, when it just needs an identifier,
  • at the moment of birth, when its first digital representation needs a redirection target URL,
  • after the first analysis, when its significance and a tentative title emerges,
  • when creating dozens of discipline-specific metadata elements that violate most metadata standards except your own,
  • during post-processing by a colleague whose name will be added as a creator,
  • when early feedback based on the tweeted identifier turns up a key insight and a new contributor,
  • and so forth, through public release, correction, revision, enhancement, etc.

Can an object have both an ARK and a DOI?

Yes. As mentioned, if you start with an ARK early in object development, you benefit from being able to keep that original identifier through to public release as the metadata matures. If in addition you think you need a DOI, that will become a second object identifier to maintain.

Assuming you wish to maintain both the new and old identifiers (to avoid breaking links that your collaborators had stored and bookmarked), an easy way forward is to set up the DOI to redirect to the ARK. In this way you could ensure correct resolution of both identifiers but only having to maintain the ARK.

But isn't presence of metadata a sign of quality?

, and such identifiers probably won't meet requirements. If assigning a certain kind of identifier (eg, a DataCite DOI) means meeting metadata requirements, that almost ensures that the object will become known by another identifier get a new and different identifier from it has been known by for what might be years of development. releasing  mean that these objects for meet metadata requirements for publication it is impossible to meet metadata requirements identifier when it's ok not to have basic metadata.

Flexible metadata (object descriptions) is critical for keeping identifiers stable throughout object life cycles. In the digital age, objects often mature in public, where they may be referenced for years in tweets and whitepapers via whatever identifier is easy or possible. Unfortunately, when they are finally ready to enter the scholarly publication system, that is often the first time that metadata requirements can be met to obtain a DOI – in other words, these well-loved objects, at the peak of celebrity, are expected to become known by a new name. The old name might continue to work, but

Metadata required to create an object DOI, for example, may not be known until after the object has been developed from "embryo" to maturity, using another identifier regularly over a period of years to reference (eg, tweeting about) it with colleagues.

Many large and small scale endeavors in research and curation require the creation and nurturing of objects over periods of weeks, months, and years. Those objects all need to be referenced by identifiers, starting from their earliest embryonic form, and even before that when they were conceived in planning documents. As objects develop to maturity, perhaps years later, those identifiers will appear in tweets, emails, whitepapers,

which Many endeavors rder to  means you can is meant the ability to store any metadata you want, including repeated elements, such as multiple authors and forwarding URLs, or no metadata at all. N2T has full metadata flexibility, while Crossref and DataCite have specific requirements (eg, the DataCite schema) to create their DOIs.

If ARKs can be deleted, how can they be trusted?

Only one resolver, n2t.net, supports all of these features, and it does so for any identifier stored with appropriate metadata. Contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support these features. For example, suffix passthrough is a feature supported by n2t.net (and purl.org has something similar called "partial redirect"), but not by doi.org or handle.net.

By metadata flexibility is meant the ability to store any metadata you want, including repeated elements, such as multiple authors and forwarding URLs, or no metadata at all. N2T has full metadata flexibility, while Crossref and DataCite have specific requirements (eg, the DataCite schema) to create their DOIs.

Stepping back, an identifier is actually an assertion that a given string of characters is associated with specific thing. The fewer people you tell, the easier it is to scrap that assertion. If you create a URL and share it only with your closest colleagues on an internal network, that is much easier to withdraw than if the URL appeared for a month on a public website where it was harvested by internet search engines. In contrast, it is hard to delete DOIs and Handles; once registered and made resolvable, they are essentially released to the world.

ARKs behave like URLs in this respect. Providers are free to create and share ARKs narrowly, in which case they're easy to delete. Even if shared more broadly, ARKs can come with persistence statements that tell you how much or how little commitment is made to them. ARKs were designed to articulate a variety of persistence statements. Persistent identifiers of other types exhibit a similar variety of commitment "flavors".

Finally, people make mistakes. ARKs, DOIs, Handles, PURLs, and URNs get broadcast in error and sometimes need to be withdrawn. When that happens, provider best practice is make a withdrawn identifier resolve to a page that explains and perhaps apologizes for the inconvenience. Despite what you may have heard, persistent identifiers are not guaranteed.

If ARKs don't require it, why would I bother to create metadata?

There are several key benefits when a mature ARK released into the world has basic metadata that is retrievable by resolving the ARK with a '?' appended to it.

First, no matter what the ARK redirects to, whether landing page or PDF file, it permits the user to obtain further information about the object, such as a description, other versions, etc. This functionality is unavailable to DOIs, which are required to redirect to landing pages (consistent with common publisher practice), and prohibited from linking directly to object content.

Second, it shores up the persistence of the binding between the ARK string and the identified object. Object access is the primary assertion of that binding, metadata confirms that assertion, and any discrepancy between the two helps to detect unauthorized tampering with the binding.

Third, providing access to metadata demonstrates basic provider commitment, adding credibility to the seriousness of the its intentions. Not every object provider can return object metadata. Finally, adding basic metadata, especially for objects that don't have textual representations, makes your objects more findable. 

What is meant by ARKs supporting early object development?

We need identifiers long before we know exactly what they refer to, or even if they refer to anything useful. An identifier that requires mature metadata cannot be used during early object development since little is known about the object. So object creators almost always initially assign identifiers that have no metadata requirements, such as URLs or ARKs.

If you start with an ARK, you benefit from being able to keep the original identifier through to public release as the metadata matures. Many objects go through intensive development and revision phases, sometimes lasting years, in which they are too immature to meet most metadata requirements. Nonetheless every object needs some sort of identifier from conception to maturity, where maturity could look like public release and further enhancement, or abandonment. Like the object itself, metadata elements need a flexible place to grow and mature over time:

  • starting in the planning phase, when it just needs an identifier,
  • at the moment of birth, when its first digital representation needs a redirection target URL,
  • after the first analysis, when its significance and a tentative title emerges,
  • when creating dozens of discipline-specific metadata elements that violate most metadata standards except your own,
  • during post-processing by a colleague whose name will be added as a creator,
  • when early feedback based on the tweeted identifier turns up a key insight and a new contributor,
  • and so forth, through public release, correction, revision, enhancement, etc.

Can an object have both an ARK and a DOI?

Yes. As mentioned above regarding early object development, if you start with an ARK, you benefit from being able to keep that original identifier through to public release as the metadata matures. If you also want a DOI, that would become a second object identifier to maintain.

Assuming you wish to maintain both the new and old identifiers (to avoid breaking links that your collaborators had stored and bookmarked), an easy way forward is to set up the DOI to redirect to the ARK. In this way you could ensure correct resolution of both identifiers but only having to maintain the ARK.

If ARKs can be deleted, how can they be trusted?

Only one resolver, n2t.net, supports all of these features, and it does so for any identifier stored with appropriate metadata. Contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support these features. For example, suffix passthrough is a feature supported by n2t.net (and purl.org has something similar called "partial redirect"), but not by doi.org or handle.net.

By metadata flexibility is meant the ability to store any metadata you want, including repeated elements, such as multiple authors and forwarding URLs, or no metadata at all. N2T has full metadata flexibility, while Crossref and DataCite have specific requirements (eg, the DataCite schema) to create their DOIs.

What is an ARK "inflection" and how does it differ from "content negotiation"?

An inflection is a change to the ending of a word to express a shift in meaning, and it permits us to define a word such as "go" without also defining "goes" and "going". For an ARK that gets to an object, simply adding a '?' to the end (an example of an ARK inflection) permits us to request metadata without having to define a separate identifier for the object's metadata. This technique is simple enough to be used by humans using a web browser. The N2T resolver supports both inflections and content negotiation.

Content negotiation is a software technique for requesting alternate formats of an object, such as the PDF or RTF form of an HTML file. Although not designed for it, content negotiation has been twisted in certain contexts to request metadata under the kludgy assumption that formats often used to hold metadata are in fact metadata. Unlike inflections, "content negotiation for metadata" doesn't work at all for objects that have legitimate representations in those formats (the list of which is growing and known only by private agreement), nor can it be used directly by humans.

Although inflections are commonly associated with ARKs, they are not "owned" by ARKs. In fact, contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support such features. So for example, inflections and suffix passthrough are supported by n2t.net for all identifier types, but not by doi.org or handle.net.Content negotiation to request descriptions of things, but human beings can't do it themselves, and it only works for things that are not already in formats that might contain descriptions. Fortunately, without restriction, both humans and software can use inflections, exemplified by the '?' at the top of this FAQ. N2T is one of the few resolvers that that does both. 

When should I use ARKs compared to DOIs, Handles, PURLs,

...

or URNs?

There are no simple answers. Identifiers (not things, but their names) are tricky to talk about, so if you hear simple answers elsewhere, beware of common fallacies.

...

When demand for a global ARK resolver arose, basic principles of openness and generality prevented the designers from creating yet another silo in the DOI/Handle/PURL mold. Instead, the ARK resolver was built to be a generic, scheme-agnostic resolver called N2T (Name-to-Thing), which now resolves over 600 types of identifier, including ARKs, DOIs, Handles, PURLs, URNs, ORCIDs, ISSNs, etc. Resolution is essentially looking in a table for an identifier string, regardless of type, and redirecting it to the right place.

The same basic principles guided the design of an earlier tool called noid, which was built for ARKs but is also regularly used by organizations that mint Handles.

What do you mean by silos?

Typically, scheme-based services are designed as silos, or closed platforms, serving a particular identifier type such as Handle, DOI, or PURL. Each silo performs the same main functions – mapping names (identifiers strings) to things (objects or metadata). Excluding all but one type of identifier string may help to capture markets, but it's wasteful and non-inclusive. It requires building the same set of services over and over for each type and violates basic principles of openness.

In contrast the N2T (Name-to-Thing) resolver and EZID (identifiers made easy) management interface were designed to work with all identifiers. Effort put into any new feature can be efficiently leveraged across all types, which sometimes creates surprising flexibility. For example, ARKs are often stored in EZID with "DOI metadata", and every DOI stored in N2T can benefit from "ARK resolution features" such as inflections and suffix passthrough, which are not available via the main DOI resolver (doi.org).

I've heard that ARKs do something called "inflections" and DOIs do "content negotiation" – what does that mean?

Only one resolver, n2t.net, supports all of these features, and it does so for any identifier stored with appropriate metadata. Contrary to popular belief, identifiers don't do anything – it's their resolvers that do or don't support these features. For example, suffix passthrough is a feature supported by n2t.net (and purl.org has something similar called "partial redirect"), but not by doi.org or handle.net.

), which now resolves over 600 types of identifier, including ARKs, DOIs, Handles, PURLs, URNs, ORCIDs, ISSNs, etc. Resolution is essentially looking in a table for an identifier string, regardless of type, and redirecting it to the right place.

The same basic principles guided the design of an earlier tool called noid, which was built for ARKs but is also regularly used by organizations that mint Handles.

What do you mean by silos?

Typically, scheme-based services are designed as silos, or closed platforms, serving a particular identifier type such as Handle, DOI, or PURL. Each silo performs the same main functions – mapping names (identifiers strings) to things (objects or metadata). Excluding all but one type of identifier string may help to capture markets, but it's wasteful and non-inclusive. It requires building the same set of services over and over for each type and violates basic principles of openness.

In contrast the N2T (Name-to-Thing) resolver and EZID (identifiers made easy) management interface were designed to work with all identifiers. Effort put into any new feature can be efficiently leveraged across all types, which sometimes creates surprising flexibility. For example, ARKs are often stored in EZID with "DOI metadata", and every DOI stored in N2T can benefit from "ARK resolution features" such as inflections and suffix passthrough, which are not available via the main DOI resolver (doi.org).Content negotiation to request descriptions of things, but human beings can't do it themselves, and it only works for things that are not already in formats that might contain descriptions. Fortunately, without restriction, both humans and software can use inflections, exemplified by the '?' at the top of this FAQ. N2T is one of the few resolvers that that does both. 

I've heard of ORCIDs, RORs, and UUIDs – where do they fit in?

...