Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

Risk of not knowing what to search for

may be addressed by

discovery endpoints and what they hold
- ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
standard extracts and starting points with examples may help
- emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
- the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint

Risk of harvested or aggregated information going out of sync
resource sync (http://www.niso.org/workrooms/resourcesync/) — the need to repeatedly synchronize and update

risk of not knowing what to search for

publish starting points & examples of queries and/or canned responses

reconciliation services — not necessarily monopolies or centralized

iterative, with curation and provenance

common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back

mashup tools that test connections

sameAs website

validation

RDF data shapes

DMCI RDF validation

extension mechanisms - Schema.org

query on different axes — query OCLC by VIAF id to get works

ability to push bookmarks but as small graphs of data, consumable by others

semantic web crawling

bookmark

a service where I can push the results of my search, organized by topic

a sort of Mendele but for everything

add it to a collection I have

similar to an annotation service

you search, you refine it, you step back — now only save as bookmarks at one level

nobody can use your bookmarks

2

a tool that would facilitate entity reconciliation

to put together UN and LC

a first pass, then improve that manually, then 2nd iteration

then publish — surface

manage difference of opinion

provenance

exclude some

centralized entity mapping

feedback by users on the mapping

need protocols

want to discover annotation — known servers with protocols

collections have been done by many different places

if we do linked data, my list is a list of URIs from many sources

on the UI won’t see that

assuming accessible SPARQL endpoints

3

other cleanup tasks — validation? consistency of ontology use

entity recognition — text mining or analytics for tools — autotaggers

4

constant crawling graphs of linked data

semantically aware web crawling — is it worth going down this path, what’s attached, what has changed

5

provenance space — who’s made a particular assertion for that

in the library domain, could imagine a layer about who’s responsible for an assertion

unspecified.

crowd sourcing — as move up toward the general public, typically track less who did it

variable credibility

acknowledge that

nanopublications

===== group 4 ====

reconciliation services — contains no data, queries a distributed set of resources

individual libraries will become the authorities for special collections — items, people, events

queries to a central area would find a match

cache the sameAs so don’t have to re-query

everybody who consumes has the cross-links

the sort of thing that OCLC might end up doing —

could be any type of object — logical to start with works

brings up the questions of the degrees of sameAs ness

when a new match is known, publish that — a notification mechanism

you would provenance those links to indicate where came from

used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done

now in the world of unique identifiers — a linkerator - for people to rank what they see

build up ant trails over time, around an object

how to make it in any way central — get it to the browser

how about the annotation example?

regular expressions against EAD for an object to suggest what they link to

feed into a system to validate

then give pointers to the link

other levels of relationship than sameAs

over time it would aggregate and

a clustering algorithm — the more a link is traversed, the space reduces

emergence sorting

software crawling the graph - how do you figure out what to trust? the world according to professor X or Y

trust is very tricky

a page rank algorithm for linked data — more for asserters

strenghthen the nodes to repeat confidence

repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up

Reddit gets a lot of traction

nanopublications

if you reify assertions — to add confidence where have more knowledge or curation

confidence levels

wikipedia has a way to accept

no confidence in semantic search engines

too siloed

visualizations have to be crafted

Page tree

services on linked data

Services on linked data

Risk of not knowing what to search for