Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The only prerequisite is to fill out an online request for a NAAN on behalf of your organization. There is no charge to obtain a NAAN and all memory organizations are welcome. Within a day or two you should receive an email containing a NAAN for your organization's exclusive use. Meanwhile consider the following.

...

  • What things do you want to name with ARKs? Generally you name objects that you own, control, or manage.
  • Where do you want your ARKs to resolve to? Examples: formatted file, surrogate for a physical thing, landing page with choices, etc.
  • Which web server will host your objects? You are asked this when you request a NAAN, even if it's not yet working.
  • Which web server/resolver will you use as hostname in the ARK-based URLs that you advertise/publish?
  • To convert ordinary web server processes into ARK-aware processes, all you  difference between providing access to ARK-identified objects vs URL-identified is like with providing access for ordinary URL-identified objects. For example, you could run all your own custom infrastructure – including content management, web hosting, minting (generating unique identifier strings), and running your own server/resolver. That infrastructure could be very simple, such as server configured to convert incoming ARK-based URLs to server file pathnames. When you request your NAAN you will be asked to supply the base URL of your local server or resolver.

    At the other end of the spectrum, you could work with a vendor that supplies all the infrastructure so that, for example, you could focus on creating content. Hybrid solutions are also common, such as just taking your current web server arrangement and just adding an identifier management piece (eg, the API/UI provided by ezid.cdlib.org, which partners with n2t.net).

    If you run a resolver, you will also want to think about whether to advertise (publish) your ARKs based at your resolver or at n2t.net. Resolving through n2t.net is always possible as a cost-free side-effect of obtaining a NAAN.

    ...

    • To keep costs down.
    • To work with exactly the metadata I want.
    • To be able to create identifiers without metadata.
    • To create an identifier as soon as I create the first draft of my data.
    • To keep that identifier private while the data and metadata evolve, and decide (maybe years) later, to publish or discard it.
    • To retain that identifier upon publication, perhaps then assigning an additional identifier, such as a DOI.
    • To integrate be compatible with the Data Citation Index ℠ and ORCID.org researcher profiles.
    • To link identifiers to different kinds of nuanced persistence commitments.
    • To be able to add queries (eg, ?lang=en) when resolving my identifiers.
    • To use open infrastructure consistent with my organization's values.
    • To create one identifier that enables millions (suffix passthrough).
    • To link directly to the objects I value instead of to landing pages.
    • To access convenient, full-function metadata via DRAFT ARK Identifiers FAQ.

    What does ARK have in common with DOI, Handle, PURL, and URN?

    ...

    ARKs are unusual in being decentralized. While one can get resolution services from a global ARK resolver called n2t.net, over 90% of the ARKs in the world are published without reference to it. More than 500 registered organizations across the world have created an estimated 3.2 billion ARKs, and, as with URLs, no one has ever paid an identifier fee to create them. Of course maintaining them isn't free. It is never without cost to keep content access persistent in the long term, regardless of identifier type.

    ...

    Here are some more differences between ARKs, DOIs, Handles, PURLs, and URNs.

    • All things eventually pass, including hostnames and the web itself and the "https://" protocol. When that first part of the identifier ceases to have meaning, only ARKs and URNs will include the label (eg, "ark:") indicating the type of identifier that remains.
    • For DOIs, Handles, and PURLs, you are required to use their respective resolvers. ARKs and URNs, permit you to use your own resolver.
    • To create DOIs and Handles, you are required to pay a membership fee and, for DOIs, per-DOI charges. There are no fees for ARKs, PURLs, and URNs.
    • To create Handles, you are required to install and maintain a local Handle server, which gives you another system to monitor, patch, and troubleshoot.
    • Although you can use a local or vendor resolver for your ARKs and URNs, ARKs can be resolved via the global n2t.net resolver.
    • The envisioned URN resolution infrastructure was never built, so URNs are currently resolved as URLs, and there is no designated global URN-as-URL resolver. In order to register to create URNs, you must apply for a URN namespace.
    • Unlike DOIs and Handles, ARKs don't have metadata requirements. ARKs that haven't been released into the world are easy to delete.

    ARKs have some unique features that support early object development: ARKs can be deleted, can be born with no metadata, and can exist with any metadata you care to store. 

    But if ARKs can be deleted, how can they be trusted?

    Being able to delete identifiers actually makes ARKs more trustworthy. The ability to delete is a vital part of healthy collection management that is denied to those non-ARK identifier types prohibiting deletion under the presumption that people, once they are asked to commit, won't make mistakes.

    People operating with software regularly turn simple human error into big tangles of systematic mistakes, even at the threshold of commitment. Making it difficult to clean them up requires dragging those messes forward in perpetuity.

    While not immune from such mistakes, ARKs have a big advantage that they can be created and deleted in the shadows, independent of publication or of archival commitment.

    Can an object have both an ARK and a DOI?

    Yes. Sometimes having two identifiers is useful, although it can become confusing when it happens often. Many people start by assigning ARKs to each thing they create in order to have a stable reference right from the beginning, even before they know whether they want to publish it, let alone keep it. Starting with an ARK, you benefit from being able to keep the original identifier from birth through to public release as the object and its metadata matures. For the subset of things that you end up wanting to publish in places that require DOIs, you can assign DOIs at publication time. This is a way in which ARKs support early object development.

    In such a scenario, to reduce the burden of maintaining both identifiers you could register the DOI to redirect to the ARK. At the cost of maintaining just one identifier (the ARK), this would keep newly published links and links previously stored and bookmarked by your collaborators from breaking.

    When should I use ARKs compared to DOIs, Handles, PURLs, or URNs?

    There are no simple answers. Identifiers (not things, but their names) are tricky to talk about, so if you hear simple answers elsewhere, beware of common fallacies.

    Nothing inherent in ARKs, DOIs, Handles, PURLs, or URNs makes them more or less fit for any particular field, domain, or sector. With an identifier resolver and administrative management service, they all provide the core service of resolution (and so do properly managed URLs). 

    Generalizations about identifier types sometimes apply when resolution and management for that type is locked into one particular vendor or provider. For example, many PURL and Handle features and restrictions are well-defined by their respective administration silos. DOIs, which are built on top of Handles, have the same resolution features and restrictions as Handles, but metadata practices are diverse and evolving across registration agencies. 

    The concrete differences that we experience, such as metadata, landing pages, and tool integration (eg, publishing tools), are not properties of identifier schemes per se, but properties of resolution, management, and citation services that various providers extend to or withhold from different identifier types. Those services are shaped in turn by communities of practice and by markets. Basic services are founded on a reliable database storing each identifier along with metadata elements (creator, title, date, redirection URL, etc) that describe the identified object. Extra services include link checking, duplicate detection, report generation, and searching.

    What are usage trends for ARKs, DOIs, Handles, PURLs, and URNs?

    As of 2019, purely on an incomplete and anecdotal level, here are a few trends that have been observed.

    • ARKs have seen broad adoption in cultural memory institutions – museums, archives, and libraries. There is strong adoption in France and francophone regions.
    • DOIs until recently have mostly been known as reliable identifiers for scientific and scholarly literature, when in fact this applies to a subset of DOIs assigned via Crossref. What it means to be a DOI is becoming harder to pin down because DOIs are being assigned to datasets, data management plans, field stations, etc. via DataCite, as well as to movies (eg, "Kung Fu Panda") via EIDR. Having said that, Crossref and DataCite DOIs have been successful in creating tools and services for scholarly publishers.
    • PURLs have seen lots of use in identifying metadata vocabulary and ontology terms.

    I've heard of ORCIDs, RORs, and UUIDs – where do they fit in?

    Those are special kinds of persistent identifiers. ORCIDs (Open Researcher and Contributor Identifiers) only identify researchers, and they link to research works using ARKs, DOIs, etc. ORCIDs look like

         https://orcid.org/0000-0001-7604-8041

    ROR (Research Organization Registry) identifiers designate organizations. For example, here's the California Digital Library:

         https://ror.org/03yrm5c26

    UUIDs are globally unique, 37-character strings that are easy for software to generate but only become usable as web addresses when made part of a URL, for example, in this ARK:

               https://somehost.example.com/3c2e39526-e0c3-41ae-be4f-07558a9458eb

    While embedding a UUID in an ordinary URL makes it actionable ("clickable"), you could expect more if it were embedded in an ARK such as

               https://n2t.net/ark:/65665/3c2e39526-e0c3-41ae-be4f-07558a9458eb

    As an ARK, for example, that UUID should return metadata (if available) and be insensitive to the hyphens, making this form equally viable:

         https://n2t.net/ark:/65665/3c2e39526e0c341aebe4f07558a9458eb

    From cradle to grave

    When are ARKs easy to delete?

    If no one knows about an identifier but you, there's no harm in deleting or withdrawing it. Stepping back, an identifier is actually an assertion that a given string of characters is associated with specific thing. The fewer people you tell, the easier it is to scrap that assertion. If you create a URL and share it only with your closest colleagues, that is much easier to withdraw than if the URL appeared for a month on a public website, from which it was harvested by internet search engines. In contrast, it is hard to delete DOIs and Handles because once registered and made resolvable, they are effectively released to the world.

    ARKs behave like URLs in this respect. Providers are free to create and share ARKs narrowly, in which case they're easy to delete.

    Perhaps surprisingly, even if shared more broadly, ARKs can come with persistence statements that tell you how much or how little commitment is made to them. ARKs were designed to articulate a variety of persistence statements, but they are certainly not alone among identifiers and objects that exhibit a variety of commitment "flavors". This is why ARKs are more accurately known as high-functioning rather than persistent identifiers.

    Finally, people make mistakes. ARKs, DOIs, Handles, PURLs, and URNs are sometimes broadcast in error and need to be withdrawn. When that happens, provider best practice is make the withdrawn identifier resolve to a page that explains and perhaps apologizes for the inconvenience. Despite the rumors, persistent identifiers are never guaranteed.

    What is meant by ARKs supporting early object development?

    People need identifiers before they know exactly what object they refer to, or if they refer to anything worth keeping. An identifier that requires mature metadata cannot be created during early development since little is known about the object. So object creators almost always initially assign identifiers that have no metadata requirements, such as URLs or ARKs.

    If you start with an ARK, you benefit from being able to keep the original identifier through to public release as the metadata matures. Many objects go through intensive development and revision phases, sometimes lasting years, during which they are too immature to meet most metadata requirements. Nonetheless every object needs some sort of identifier from conception to maturity, where maturity could look like public release and further enhancement, or abandonment. It is easy to abandon ARKs that have not been released into the world.

    Like the object itself, metadata elements need a flexible place to grow and mature over time:

    • starting in the planning phase, when it just needs an identifier,
    • at the moment of birth, when its first digital representation needs a redirection target URL,
    • after the first analysis, when its significance and a tentative title emerges,
    • when creating dozens of discipline-specific metadata elements that violate most metadata standards except your own,
    • during post-processing by a colleague whose name you will add as an additional creator,
    • when early feedback based on the tweeted identifier turns up a key insight and a new contributor,
    • and so forth, through to archiving, abandonment, public release, correction, revision, enhancement, etc.

    Unlike Crossref and DataCite DOIs, which require specific metadata (eg, see the DataCite schema), ARKs do not constrain any of these activities. Moreover the N2T.net resolver actually supports all of them.

    If ARKs don't require it, why would I bother to create metadata?

    There are several key benefits to having metadata, which is strongly recommended for all ARKs that are mature (no longer under development) and released into the world. The standard way to retrieve the metadata is to resolve the ARK with a '?' or '??' appended to it.

    First, no matter what the ARK redirects to, whether landing page or PDF file, metadata gives the user further information about the object, such as a description, other versions, etc. In contrast, so as make sure metadata is available, DOIs are required to redirect to landing pages (consistent with common publisher practice) and prohibited from linking directly to object content.

    • Crossref and DataCite DOIs link to publisher landing pages constructed around but not directly to objects you care about, but ARKs can freely link directly to objects you care about, which is machine- and human-friendly since it does not require an extra human navigation step for common tasks such as
      • opening an article's PDF file for reading,
      • referencing an image file meant to be incorporated automatically inline into a document, and
      • citing a spreadsheet to be used for direct data analysis software.
    • DOIs do not support ARK-style DRAFT ARK Identifiers FAQ that permit access to metadata regardless of whether an identifier points to an object or its landing page.
    • Unlike DOIs and Handles, ARKs don't have metadata requirements. ARKs that haven't been released into the world are easy to delete.
    • All things eventually pass, including hostnames and the web itself and the "https://" protocol. When that first part of the identifier ceases to have meaning, only ARKs and URNs will include the label (eg, "ark:") indicating the type of identifier that remains.
    • For DOIs, Handles, and PURLs, you are required to use their respective resolvers. ARKs and URNs, permit you to use your own resolver.
    • To create DOIs and Handles, you are required to pay a membership fee and, for DOIs, per-DOI charges. There are no fees for ARKs, PURLs, and URNs.
    • To create Handles, you are required to install and maintain a local Handle server, which gives you another system to monitor, patch, and troubleshoot.
    • Although you can use a local or vendor resolver for your ARKs and URNs, ARKs can be resolved via the global n2t.net resolver.
    • The envisioned URN resolution infrastructure was never built, so URNs are currently resolved as URLs, and there is no designated global URN-as-URL resolver. In order to register to create URNs, you must apply for a URN namespace.

    ARKs have some unique features that support early object development: ARKs can be deleted, can be born with no metadata, and can exist with any metadata you care to store. 

    But if ARKs can be deleted, how can they be trusted?

    Being able to delete identifiers actually makes ARKs more trustworthy. The ability to delete is a vital part of healthy collection management that is denied to those non-ARK identifier types prohibiting deletion under the presumption that people, once they are asked to commit, won't make mistakes.

    People operating with software regularly turn simple human error into big tangles of systematic mistakes, even at the threshold of commitment. Making it difficult to clean them up requires dragging those messes forward in perpetuity.

    While not immune from such mistakes, ARKs have a big advantage that they can be created and deleted in the shadows, independent of publication or of archival commitment.

    Can an object have both an ARK and a DOI?

    Yes. Sometimes having two identifiers is useful, although it can become confusing when it happens often. Many people start by assigning ARKs to each thing they create in order to have a stable reference right from the beginning, even before they know whether they want to publish it, let alone keep it. Starting with an ARK, you benefit from being able to keep the original identifier from birth through to public release as the object and its metadata matures. For the subset of things that you end up wanting to publish in places that require DOIs, you can assign DOIs at publication time. This is a way in which ARKs support early object development.

    In such a scenario, to reduce the burden of maintaining both identifiers you could register the DOI to redirect to the ARK. At the cost of maintaining just one identifier (the ARK), this would keep newly published links and links previously stored and bookmarked by your collaborators from breaking.

    When should I use ARKs compared to DOIs, Handles, PURLs, or URNs?

    There are no simple answers. Identifiers (not things, but their names) are tricky to talk about, so if you hear simple answers elsewhere, beware of common fallacies.

    Nothing inherent in ARKs, DOIs, Handles, PURLs, or URNs makes them more or less fit for any particular field, domain, or sector. With an identifier resolver and administrative management service, they all provide the core service of resolution (and so do properly managed URLs). 

    Generalizations about identifier types sometimes apply when resolution and management for that type is locked into one particular vendor or provider. For example, many PURL and Handle features and restrictions are well-defined by their respective administration silos. DOIs, which are built on top of Handles, have the same resolution features and restrictions as Handles, but metadata practices are diverse and evolving across registration agencies. 

    The concrete differences that we experience, such as metadata, landing pages, and tool integration (eg, publishing tools), are not properties of identifier schemes per se, but properties of resolution, management, and citation services that various providers extend to or withhold from different identifier types. Those services are shaped in turn by communities of practice and by markets. Basic services are founded on a reliable database storing each identifier along with metadata elements (creator, title, date, redirection URL, etc) that describe the identified object. Extra services include link checking, duplicate detection, report generation, and searching.

    What are usage trends for ARKs, DOIs, Handles, PURLs, and URNs?

    As of 2019, purely on an incomplete and anecdotal level, here are a few trends that have been observed.

    • ARKs have seen broad adoption in cultural memory institutions – museums, archives, and libraries. There is strong adoption in France and francophone regions.
    • DOIs until recently have mostly been known as reliable identifiers for scientific and scholarly literature, when in fact this applies to a subset of DOIs assigned via Crossref. What it means to be a DOI is becoming harder to pin down because DOIs are being assigned to datasets, data management plans, field stations, etc. via DataCite, as well as to movies (eg, "Kung Fu Panda") via EIDR. Having said that, Crossref and DataCite DOIs have been successful in creating tools and services for scholarly publishers.
    • PURLs have seen lots of use in identifying metadata vocabulary and ontology terms.

    I've heard of ORCIDs, RORs, and UUIDs – where do they fit in?

    Those are special kinds of persistent identifiers. ORCIDs (Open Researcher and Contributor Identifiers) only identify researchers, and they link to research works using ARKs, DOIs, etc. ORCIDs look like

         https://orcid.org/0000-0001-7604-8041

    ROR (Research Organization Registry) identifiers designate organizations. For example, here's the California Digital Library:

         https://ror.org/03yrm5c26

    UUIDs are globally unique, 37-character strings that are easy for software to generate but only become usable as web addresses when made part of a URL, for example, in this ARK:

               https://somehost.example.com/3c2e39526-e0c3-41ae-be4f-07558a9458eb

    While embedding a UUID in an ordinary URL makes it actionable ("clickable"), you could expect more if it were embedded in an ARK such as

               https://n2t.net/ark:/65665/3c2e39526-e0c3-41ae-be4f-07558a9458eb

    As an ARK, for example, that UUID should return metadata (if available) and be insensitive to the hyphens, making this form equally viable:

         https://n2t.net/ark:/65665/3c2e39526e0c341aebe4f07558a9458eb

    From cradle to grave

    When are ARKs easy to delete?

    If no one knows about an identifier but you, there's no harm in deleting or withdrawing it. Stepping back, an identifier is actually an assertion that a given string of characters is associated with specific thing. The fewer people you tell, the easier it is to scrap that assertion. If you create a URL and share it only with your closest colleagues, that is much easier to withdraw than if the URL appeared for a month on a public website, from which it was harvested by internet search engines. In contrast, it is hard to delete DOIs and Handles because once registered and made resolvable, they are effectively released to the world.

    ARKs behave like URLs in this respect. Providers are free to create and share ARKs narrowly, in which case they're easy to delete.

    Perhaps surprisingly, even if shared more broadly, ARKs can come with persistence statements that tell you how much or how little commitment is made to them. ARKs were designed to articulate a variety of persistence statements, but they are certainly not alone among identifiers and objects that exhibit a variety of commitment "flavors". This is why ARKs are more accurately known as high-functioning rather than persistent identifiers.

    Finally, people make mistakes. ARKs, DOIs, Handles, PURLs, and URNs are sometimes broadcast in error and need to be withdrawn. When that happens, provider best practice is make the withdrawn identifier resolve to a page that explains and perhaps apologizes for the inconvenience. Despite the rumors, persistent identifiers are never guaranteed.

    What is meant by ARKs supporting early object development?

    People need identifiers before they know exactly what object they refer to, or if they refer to anything worth keeping. An identifier that requires mature metadata cannot be created during early development since little is known about the object. So object creators almost always initially assign identifiers that have no metadata requirements, such as URLs or ARKs.

    If you start with an ARK, you benefit from being able to keep the original identifier through to public release as the metadata matures. Many objects go through intensive development and revision phases, sometimes lasting years, during which they are too immature to meet most metadata requirements. Nonetheless every object needs some sort of identifier from conception to maturity, where maturity could look like public release and further enhancement, or abandonment. It is easy to abandon ARKs that have not been released into the world.

    Like the object itself, metadata elements need a flexible place to grow and mature over time:

    • starting in the planning phase, when it just needs an identifier,
    • at the moment of birth, when its first digital representation needs a redirection target URL,
    • after the first analysis, when its significance and a tentative title emerges,
    • when creating dozens of discipline-specific metadata elements that violate most metadata standards except your own,
    • during post-processing by a colleague whose name you will add as an additional creator,
    • when early feedback based on the tweeted identifier turns up a key insight and a new contributor,
    • and so forth, through to archiving, abandonment, public release, correction, revision, enhancement, etc.

    Unlike Crossref and DataCite DOIs, which require specific metadata (eg, see the DataCite schema), ARKs do not constrain any of these activities. Moreover the N2T.net resolver actually supports all of them.

    If ARKs don't require it, why would I bother to create metadata?

    Creating metadata (extra information associated with or describing an object) has several key benefits. First, no matter what the ARK redirects to, whether to a landing page or a file, metadata can give people more information about the object, such as details about its origins, references to newer versions, etc. Typically for ARKs metadata is accessed via DRAFT ARK Identifiers FAQ.

    Metadata is also critical to your users making selection decisions when direct object access is expensive or inconvenient. They want to read an abstract before buying, or limiting search results to a particular author.

    By themselves, persistent identifier strings are often opaque, revealing little about what they identify (generally non-opaque identifiers do not age or travel well). On the other hand, opaque identifiers are difficult because both creators and receivers have no idea as to what the identifier was meant to identify, so in the absence of metadata everyone is forced to trust the accessed object itself.

    In this way, metadata also shores up identifier persistence (what object the ARK string is associated with). Any discrepancy between the metadata and the accessed object helps to detect changes in that association.

    Second, it shores up the persistence of the binding between the ARK string and the identified object. The primary assertion of the binding is the experience of resolving the identifier to the object. A secondary, confirming assertion of the binding is the experience of resolving to its metadata. Any discrepancy between the the metadata and the object helps to detect changes in that binding. 

    Third, providing access to metadata demonstrates basic provider commitment. This adds credibility to the seriousness of its intentions since not every object provider can return object metadata. Finally, adding basic metadata, especially for objects that don't have textual representations, makes your objects more findable. 

    ...

    xxx Dublin Core Kernel metadata.

    Anchor
    inflections
    inflections
    What is an ARK "inflection" and how does it differ from "content negotiation"?

    An inflection is a change to the ending of a word to express a shift in meaning. It permits us to define a word such as "go" without also defining "goes" and "going". To an ARK that leads to an object, simply adding a '?' to the end (an example of an ARK inflection) permits us to request metadata without having to define a separate identifier for the object's metadata. This simple technique can be used by a human with a web browser. The N2T resolver supports both inflections and content negotiation.

    ...