You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

Risk of not knowing what to search for

may be addressed by
  • discovery endpoints and what they hold
    • ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
  • standard extracts and starting points with examples may help
    • emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
    • the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint
risk of not knowing what to search for
publish starting points & examples of queries and/or canned responses
reconciliation services — not necessarily monopolies or centralized
iterative, with curation and provenance
common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back
mashup tools that test connections
sameAs website
validation
RDF data shapes
DMCI RDF validation
extension mechanisms - Schema.org
query on different axes — query OCLC by VIAF id to get works
ability to push bookmarks but as small graphs of data, consumable by others
semantic web crawling
bookmark
a service where I can push the results of my search, organized by topic
a sort of Mendele but for everything
add it to a collection I have 
similar to an annotation service
you search, you refine it, you step back — now only save as bookmarks at one level
nobody can use your bookmarks
2
a tool that would facilitate entity reconciliation
to put together UN and LC
a first pass, then improve that manually, then 2nd iteration
then publish — surface
manage difference of opinion
provenance
exclude some
centralized entity mapping
feedback by users on the mapping
need protocols
want to discover annotation — known servers with protocols 
collections have been done by many different places
if we do linked data, my list is a list of URIs from many sources
on the UI won’t see that
assuming accessible SPARQL endpoints
3
other cleanup tasks —  validation? consistency of ontology use
entity recognition — text mining or analytics for tools — autotaggers
4
constant crawling graphs of linked data
semantically aware web crawling — is it worth going down this path, what’s attached, what has changed
5
provenance space — who’s made a particular assertion for that
in the library domain, could imagine a layer about who’s responsible for an assertion
unspecified.
crowd sourcing — as move up toward the general public, typically track less who did it
variable credibility
acknowledge that
nanopublications
===== group 4 ====
reconciliation services — contains no data, queries a distributed set of resources
individual libraries will become the authorities for special collections — items, people, events
queries to a central area would find a match
cache the sameAs so don’t have to re-query
everybody who consumes has the cross-links
the sort of thing that OCLC might end up doing — 
could be any type of object — logical to start with works 
brings up the questions of the degrees of sameAs ness
when a new match is known, publish that — a notification mechanism
you would provenance those links to indicate where came from
used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done
now in the world of unique identifiers — a linkerator - for people to rank what they see
build up ant trails over time, around an object
how to make it in any way central — get it to the browser
how about the annotation example?
regular expressions against EAD for an object to suggest what they link to
feed into a system to validate
then give pointers to the link
other levels of relationship than sameAs
over time it would aggregate and 
a clustering algorithm — the more a link is traversed, the space reduces
emergence sorting
software crawling the graph - how do you figure out what to trust? the world according to professor X or Y
trust is very tricky
a page rank algorithm for linked data — more for asserters
strenghthen the nodes to repeat confidence
repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up
Reddit gets a lot of traction
nanopublications
if you reify assertions — to add confidence where have more knowledge or curation
confidence levels
wikipedia has a way to accept 
no confidence in semantic search engines
too siloed
visualizations have to be crafted  
  • No labels