Date
Attendees
- Karen Hanson, Dave Vieglais, Greg Janee, Donny Winston, Tom Creighton, John Kunze
Goals
ARK spec and NAAN schema transition
Discussion items
Item | Who | Notes |
---|---|---|
Announcements | ||
Any news items we should blog about? Any calls for papers, submission deadlines, upcoming meetings we should note? Please add to Calendar of events.
| dw: i'll be at IDW (International Data Weeik) gj: https://datacurationnetwork.org/events/annual-meeting/ dw, kh: blog post looks fine | |
ARK spec transition plan
| We didn't get to this. Will set up a separate meeting. | |
Topic: saving periodic dumps of ARK metadata Any thoughts on this exchange around April 20 on the arks-forum? AK: "My question was primarily focused on the long-term sustainability of what you are naming secondary content, that is, metadata. DW: "One position on ensuring long-term metadata availability is the "Available data" bullet of <https://openscholarlyinfrastructure.org/#insurance>, i.e. "Underlying data should be made easily available via periodic data dumps." Crossref has adopted this position, and for its allocation of DOIs and stewardship of associated metadata, has so far provided three annual dumps available via torrent (last blog post, search for "Crossref" on academictorrents.com, landing page for their last (April 2022) dump). Their last dump, in April 2022, contained 134M records and is 160GB. DataCite has not yet provided a similarly clear dump of their DOI holdings, but someone has taken an interest in doing this for them, posting the dumps to archive.org, e.g. https://archive.org/details/datacite_dump_20221118 is the latest there. This, of course, is still fragmented for DOI holdings, i.e. one needs to gather such dumps from each DOI provider. This is perhaps a practically sustainable situation for the DOI system because the various providers are known and relatively (vs the ARK system) few in number. For the ARK community, I can see clear value in voluntary consolidation of e.g. CC0-licensable metadata across NAAs to a shared store on a periodic (e.g. quarterly or annual) basis. So yes, I also am interested in ongoing discussion on this topic. P.S. Another potential "leg" of redundancy is to use Amazon's current Open Data program (https://aws.amazon.com/opendata/) as e.g. the OpenAlex effort does for dumps (https://docs.openalex.org/download-all-data/download-to-your-machine). I stress "leg" here because by no means am I suggesting any singular dependence whatsoever on this large corporation's current offering of free hosting." NT: "It's been interesting following this discussion. I'm glad that Donny is pointing to POSI. Might a lightweight approach to storing additional copies of the n2t metadata are public github and gitlab repositories that could be updated monthly or on some sort of periodic basis through a simple git comitt and push? If additional preservation is desired, internet archive or OSF might be a good choice." JK: "I'll make sure to add it to the agenda in the ongoing discussions we are having in the ARK Alliance Advisory Group." | dw: I think it'd be useful to have a dump of the fact that, for example, ark:12090/ is intended to pass through to https://lib.cam.ac.uk/ark:$id . This is currently only known if https://n2t.net/ark:12090/ resolves. dv: ideally, naan registry will allow preferred forwarding form, then new orgs could do it gj, tc: dumps may not be as useful as oai-pmh dw: particularly for an opaque identifier, some descriptive information is helpful. for DOIs for publications, this metadata is typically citation metadata such as (author,title,journal,date,etc) dw: can be hard to put up an API compared to a file | |
| dv: https://github.com/CDLUC3/naan_reg_public possibly part of new Maria's meeting on proposed naan schema changes |