SHARE Hackathon and Community Meeting

July 11-14, 2016
Charlottesville, VA

Monday, July 16 – Hackathon Day 1

Jeff Spies

SHARE version 2. More specificity about the contents of the database

Need interfaces for SHARE. SHARE does not want to be an interface to the scholarly work

Data needs discovery and refinement

Rick Johnson

Exciting time to be involved with SHARE

Erin Braswell

OSF work space. Code at GitHub.

Provider -> Harvester -> raw_data -> Normalizer -> normalized_data -> changes -> change_set -> versions -> entities

The Harvester gets the data from the provider. Uses date restrictions to get "new" data. The normalizer creates the values that can go into the SHARE data models.

Title issues: Unicode, LateX, MS Word, foreign languages. Attempt to store the language provided by the provider. Joined fields for titles with multiple titles. Can be stored as a a list n the extra class.

Normalizers can guess title or identifier or DOI. Usually conservative normalizers.

Idea: data inspectors: Write elastic searches to get percentages of populated/vacant fields, by provider, by date range. Would show the density of field values in the normalized data. Could be used to draw control charts of field values density. Mirror the values.

Idea: data inspectors: Identifiers are a problem, often come in "random".

Idea: data inspectors: feed the results back the the providers. The providers may be able to suggestions enhancers to the harvesters and normalizers.

Documents can be updated – provider's id. If the metadata comes in for a record that exists, COS versions the record and provides the most current unless the query asks for versions.

See https://staging-share.osf.io/api/

Tuesday – Hackathon Day 2

Wrote the share-data-inspector. Upload to GithHiub and provide link here

Wednesday – Community Meeting Day 1

Keynote Siva Vaidhyanathan, UVa – The Operating System of Our Lives: How Google, Facebook and Apple plan to manage everything

Relationships with technology and information and communication changing rapidly. Mapping a game onto reality, engaging millions of people immediately into a game – Pokemon Go. Facebook Live – mapping reality into the virtual world, immediately, effortlessly, in real-time. Facebook took the video down for an hour, did not anticipate the incident of violence. 1.6B users, leading source of news for many millions of people. Facebook matches content to people. Facebook denying its level of power in the world. Google has the same position – constantly underplaying its role in pointing people at information.

We are collectively dependent on Google.

"The web is dead" – flows of data are not open docs loosely joined. Most data is moving through proprietary devices and formats. Our concept of the Internet is flawed/primitive. We have never been comfortable with the concepts of radical openness. Internet described in terms of place based metaphors "cyberspace" "Internet superhighway." Mobile devices changed that.

Apple sells boxes. Microsoft sells software. Amazon a retailer, largest source of revenue is AWS. Facebook sells connectivity to people. Google sells connectivity to information. Compete for labor, political power, advertising revenue, attention. Each has a plan to "win the game" – to become the operating system of our lives. Put things on our bodies, drive our cars, fully imbedded in our bodies. Data flows must be proprietary and controlled. Can not be open/standard.

Internet of Things – forget it. Seems helpful. The important thing is the monitor and managing of people. Us. Companies must have a lot of knowledge about us. Difficult to enter the market – these companies have 18 years of data on us.

Edward Snowden showed us the data the government is collecting, and the purposes they have for the data. State actors are not benign, and often result in violence. Chinese government in full association with its social media companies. All states are excited by Modi, Putin, Erdogan, and the work they are doing on surveillance. Surveillance will increase.

We have voices as citizens. The Googlization of Everything.

Breakout – SHARE Notify Atom Feed

https://osf.io/share 117 providers, 7 million records. Clinical Trials.gov Zenodo, PLoS, Arxiv.org, Figshare, and 50 instititional providers.

How might we use the data:

The VIVO Use Case – showcase the work of the people at an institution. All the work.
Track work over time – increase/decrease of various kinds of work.
Check work over time – what do we know, what does SHARE know?
Understand the social network of scholarship – who works with who across institutions across the world
Understand the trajectory of scholarship – what areas are emerging, what areas are receding?

Atom Query String

http://osf.io/share/atom/?q=(shareProperties.source:asu)AND(title:"fish")

http://osf.io/share/atom/?q="maryann martone"OR"maryann e martone"

http://osf.io/share/atom/?q="m conlon"OR"Michael Conlon"

http://Blogtrottr.com for sending a feed digest to a mail address on a regular schedule.

Breakout session – related projects

Gary Price, Infodocket. Find more users for SHARE – high school students. Include press references to research. Semantic Scholar.

Karen Hanson, Portico, Ithaka, RMap. DiSCO – distributed scholarly compound object. Linked Open Data. Very cool. Discos can related to each other. Each disco has an immutable identifier (URI) that points at the Disco. Assertions about the resources. No ontology restrictions. Discos have a status. Using known ontological elements for connections. OSF Person to OSF Project to Datacite URI, linked. Plug in a DOI and see a graph of what RMap knows of that resource. IEEE was a sponsor, used IEEE data on publications to help validate RMAP. Has RDF representation of each DiSCO. End of grant, all tools will be open source. http://rmap-project.info

Lisa Johnson, University of Minnesota. Data Curation Network. Rise of Data Sharing Culture. Role of librarians – discipline specific expertise, technology expertise.

Data curation network: Minn, Cornell, PennState, Illinois, Michigan, WUSTL. Collecting and reporting data curation experiences, metrics for results. http://sites.google.com/DataCurationNetwork

Anita de Waard, Elsevier

Hackathon Report back

Institutional Dashboard

Data Inspector

Metadata documentation

Research Data Discovery in Share

Data is coming from DataCite. Is there a data type for datasets? Yes, but perhaps not in the API yet?

Quality of data? Depends on the provider. Level of curation varies.

Sharing and discovering artifacts of the research process? Some artifacts can not be shared – proposals before funded. Data management plans before funding.

Does DataCite totally duplicate Dryad for data set consumption? Metadata might be different. Similar questions applyu to other overlapping services – Dataverse and DataCite.

VIVO and SHARE

Alexander Garcia Castro

SHARE is chaotic and promiscuous. VIVO is chaste, great precision.

Research Hub

SHARE Scopus Mendeley GitHub

Match and claim

Search -> Claim -> Add -> Connect Research Objects -> Social Connections -> Done

VIVO needs and engagement strategy. Beautiful, clear models, open, reusable semantic data.

SHARE is big, but messy. Also needs an engagement strategy.

Mendeley, ResearchGate. Giving researchers something. OpenVIVO has a bit more, but still very little.

Thursday, Community Meeting Day 2

Jeff Spies, Scholarly Workflow

OSF as a platform for scholarly workflow. Slides available here: http://osf.io/9kcd3

MC Needs:

Identity
Extensible/local workflow
Github issues

Prue Adler, Brandon Butler, Metadata Copyright Legal Guidance

Copyright protects the original expression of the authors. Modicum of creativity, independent creation. Copyright does not protect facts, ideas, discoveries, systems. Effort, time, expertise are irrelevant in the US, not in the UK and EU.

Merger doctrine – if the idea can be expressed in only a limited number of ways, the expression merges w/ fact and is unprotected.

Selection and arrangement of facts can be protected if creative and original.

No copyright in words, titles, and short phrases.

No copyright in blank forms (psychometrics – perhaps this is a patentable method)

MC: VIVO Project was able to work with Web of Science and SCOPUS to clarify which facts in their databases were public domain and which were not. Public Domain facts can be harvested from these systems and used in VIVO systems, effectively making the facts open and reusable by others.

Contracts can restrict reuse regardless of contract.

Copyright applies for 70 years after the author(s) death.

Brian Nosek – Research Integrity

Signals – open data, open materials, preregistered. Badges are stupid, but signals helpful.

3% of articles had recognition of open data, two years later 40% have open data. PSCI journal.

http://cos.io/top Top guidelines. 713 journals. 62 organizations in the process of review and adoption of the guidelines.

Two modes of research: exploratory, confirmation

Preregistration challenge: http://cos.io/prereg

Registered reports:

Space shortcuts

Page tree

Monday, July 16 – Hackathon Day 1

Jeff Spies

Rick Johnson

Erin Braswell

Tuesday – Hackathon Day 2

Wednesday – Community Meeting Day 1

Keynote Siva Vaidhyanathan, UVa – The Operating System of Our Lives: How Google, Facebook and Apple plan to manage everything

Breakout – SHARE Notify Atom Feed

Atom Query String

Breakout session – related projects

Hackathon Report back

Research Data Discovery in Share

VIVO and SHARE

Alexander Garcia Castro

Thursday, Community Meeting Day 2

Jeff Spies, Scholarly Workflow

Prue Adler, Brandon Butler, Metadata Copyright Legal Guidance

Brian Nosek – Research Integrity

Space shortcuts

Page tree

2016-07-11 Trip Report -- SHARE Hackathon and Community Meeting

Monday, July 16 – Hackathon Day 1

Jeff Spies

Rick Johnson

Erin Braswell

Tuesday – Hackathon Day 2

Wednesday – Community Meeting Day 1

Keynote Siva Vaidhyanathan, UVa – The Operating System of Our Lives: How Google, Facebook and Apple plan to manage everything

Breakout – SHARE Notify Atom Feed

Atom Query String

Breakout session – related projects

Hackathon Report back

Research Data Discovery in Share

VIVO and SHARE

Alexander Garcia Castro

Thursday, Community Meeting Day 2

Jeff Spies, Scholarly Workflow

Prue Adler, Brandon Butler, Metadata Copyright Legal Guidance

Brian Nosek – Research Integrity