Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

This is an attempt to capture this thread

Presentation given at the London 2010 Committer's meeting is here

Fedora in the context of the Semantic web and Linked Data

...

To be Semantic Web and Linked Data friendly involves

  • publishing dereferencable dereferenceable http URIs for resources
  • publishing of relationships between resources using these identifiers

The new REST API is a move forward in supporting these requirements as we now have dereferencable dereferenceable http URIs for Fedora resources.

...

  • Identifiers
    • Fedora resources have identifiers such as namespace:pid and namespace:pid/datastream, and their info:fedora/ URI forms (and similarly for disseminations)
      • These identifiers are effectively scoped to a repository installation
    • The new REST API provides globally dereferencable dereferenceable http URIs for resources, but these are not "defined" as (canonical) identifiers for resources.
    • The existing "LITE" APIs also provide resolvable URI resource identifiers
  • Relationships
    • The resource index is a single "graph" containing relationships for all objects
    • Relationships must have either a Fedora object or datastream as the subject
      • Limits metadata expression to "flat" schemes such as DC
    • No support for "arbitrary" RDF datastreams in the resource index (eg for implementing additional RDF metadata schemes)
    • Resource identifiers used in relationships are of the info:fedora/ form
      • Difficult to "interpret" relationships outside of the scope of the repository
    • The "specification" of what relationships exist for an object is defined in imperative code

...

Therefore this would support indexing of arbitrary RDF metadata datastreams in the resource index - for instance supporting metadata schemes that are not "flat".

There are some Dublin Core examples [DEV:1] where Fedora would currently be unable to index the RDF in the resource index, including

  • using foaf to describe a person who is a dc:creator of a resource
  • identifying the taxonomy used to populate dc:subject

[DEV:1] http://dublincore.org/documents/dc-rdf/#app-a

Questions and issues

  • The graph hierarchy to use - how granular? Start with something simple?
  • Mapping between resource identifiers and graph names
  • Separation of "core" relationships from "user-defined" relationships into different overall views? If the intention of the resource index is to store relationships between objects, we may not want to pollute that with other relationships, eg from arbitrary RDF datastreams
    • Relationships about the object and its datastreams - in <#ri>
    • Relationships from RELS-EXT, RELS-INT, DC - in <#ri>
    • Relationships from arbitrary RDF datastreams/disseminators disseminations - in <#riUser>
    • <#riFull> as a union of <#ri> and <#riUser>
  • Performance. Need to evaluate query performance over a network of named graphs vs storing all relationships in one single graph
  • Triple store support: Mulgara supports named graphs and views, what about other triple stores? MPTStore?
  • Impact on Mulgara's free-text index, do we create a parallel structure of free text graphs? Does Mulgara even support this?

...

  • object properties
  • datastream properties
  • reserved datastreams that contain RDF (RELS-*)
  • reserved datastreams are translated to RDF (DC)
  • relationships between objects and their datastreams and disseminatorsdisseminations
  • relationships between objects and their content models

...

  • arbitrary RDF datastreams
  • arbitrary XML datastreams to be "lifted" to triples
  • disseminators disseminations serving RDF

Wiki MarkupTo support a flexible and extensible approach, we could define the generation of triples using content models (system and user) and a declarative approach for specifying triples (XSLT, GRDDL\[DEV:1\]).

  • System content model disseminators for generating RDF for
    • Object and datastream properties triples (from the object's serialisation/FOXML)
    • Relationships between objects, datastreams and disseminators disseminations (from the object's serialisation/FOXML)
    • XML datastreams (DC)
  • User content models specifying
    • Additional arbitrary RDF datastreams to index
    • RDF disseminators disseminations to index
    • Conversion patterns for other XML datastreams and disseminatorsdisseminations

Updating of the resource index could then take place by querying the disseminations and datastreams specified by the system and user content models when an object is created, updated or deleted.

The resource index is currently updated by code in DOManager. An alternative to this could be to reimplement the update mechanism using a management decorator pattern (declared in fedora.fcfg).

[DEV:1] GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages: GRDDL Primer Wiki Markup\[1\] GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages: GRDDL Primer [http://www.w3.org/TR/grddl-primer/|http://www.w3.org/TR/grddl-primer/]

Questions and Issues

  • How to define the above using system and user content models
  • How to specify the mapping between XML and RDF

...

  • POST a set of triples to create new ones
  • DELETE a set of triples to be deletedunmigrated-wiki-markup
  • PUT a set of modifications to perform, eg using (a subset of) SPARQL Update \ [DEV:1\]

Additionally, or alternatively, "writeable disseminatorsmethods" could be provided as a generic mechanism to implement this, eg PUT a SPARQL Update to /objects/{pid}/methods/{sDefPid}/relationships?datastream=RELS-EXT

All of the relationship API methods should operate directly on Fedora objects to remove dependency on the resource index - relationship GET methods should query the object directly rather than issuing RI queries.

Wiki Markup\[1\] SPARQL Update - A language for updating RDF graphs: [http://www.w3.org/Submission/SPARQL-Update/|[DEV:1] SPARQL Update - A language for updating RDF graphs: http://www.w3.org/Submission/SPARQL-Update/]

Questions and issues

  • REST endpoints to use - explicit relationships URIs vs content negotiation vs URL query string
  • Relationships update specification (SPARQL Update, or ...)
  • Supporting "generic" updates, eg repository-wide relationships methods and methods operating on an object as a whole
    • Subject and predicate can be used to determine what to update for object properties, datastream properties, Dublin Core
    • RELS-EXT, RELS-INT and arbitrary datastreams present a challenge. A triple with a Fedora object as a subject could be stored in RELS-EXT or in an arbitrary RDF datastream. Do we restrict fedora-model and fedora-system predicates to RELS-EXT and RELS-INT?
  • Supporting updates to XML datastreams that get converted to RDF
    • eg updating DC through relationship API methods

6 Support for

...

dereferenceable http URI resource identifiers in relationships

Fedora resources are currently identified using the info:fedora namespace. If resource identifiers are exposed as dereferencable dereferenceable http URIs using the REST API URIs, it would be useful to support these identifiers in relationships. Ie the ability to query and manipulate relationships using both the info:fedora namespace for Fedora resources and the http REST URIs.

...