Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ARK spec and NAAN schema transition

Discussion items

ItemWhoNotes
Announcements

Any news items we should blog about? Any calls for papers, submission deadlines, upcoming meetings we should note? Please add to Calendar of events.


ARK spec transition plan

  • Tom Creighton analysis of event date ordering and dependencies
  • date of interest to NAAN group: when do we advise new NAAN holders to resolve both forms? do we point them to one of the newer specs?


Topic: saving periodic dumps of ARK metadata

Any thoughts on this exchange around April 20 on the arks-forum? 

AK: "My question was primarily focused on the long-term sustainability of what you are naming secondary content, that is, metadata.

It is promising for the future of the ARK system, that there could be enhancements in the latest N2T software that may extend its capabilities in a way that opens the system to more ARK organizations, perhaps enabling them to deposit metadata in external storage."

DW: "One position on ensuring long-term metadata availability is the "Available data" bullet of <https://openscholarlyinfrastructure.org/#insurance>, i.e. "Underlying data should be made easily available via periodic data dumps."

Crossref has adopted this position, and for its allocation of DOIs and stewardship of associated metadata, has so far provided three annual dumps available via torrent (last blog post, search for "Crossref" on academictorrents.com, landing page for their last (April 2022) dump). Their last dump, in April 2022, contained 134M records and is 160GB.

DataCite has not yet provided a similarly clear dump of their DOI holdings, but someone has taken an interest in doing this for them, posting the dumps to archive.org, e.g. https://archive.org/details/datacite_dump_20221118 is the latest there.

This, of course, is still fragmented for DOI holdings, i.e. one needs to gather such dumps from each DOI provider. This is perhaps a practically sustainable situation for the DOI system because the various providers are known and relatively (vs the ARK system) few in number. For the ARK community, I can see clear value in voluntary consolidation of e.g. CC0-licensable metadata across NAAs to a shared store on a periodic (e.g. quarterly or annual) basis. So yes, I also am interested in ongoing discussion on this topic.

P.S. Another potential "leg" of redundancy is to use Amazon's current Open Data program (https://aws.amazon.com/opendata/) as e.g. the OpenAlex effort does for dumps (https://docs.openalex.org/download-all-data/download-to-your-machine). I stress "leg" here because by no means am I suggesting any singular dependence whatsoever on this large corporation's current offering of free hosting."

NT: "It's been interesting following this discussion. I'm glad that Donny is pointing to POSI. Might a lightweight approach to storing additional copies of the n2t metadata are public github and gitlab repositories that could be updated monthly or on some sort of periodic basis through a simple git comitt and push? If additional preservation is desired, internet archive or OSF might be a good choice."

JK: "I'll make sure to add it to the agenda in the ongoing discussions we are having in the ARK Alliance Advisory Group."





Action items

  •