Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Exchanging repository contents

The most Most sites on the Internet are oriented towards human consumption. While HTML may be a good format to create websites, it is not a good format to export data in a way easy for a computer can to work with. Like the most software for building repositories, DSpace supports OAI-PMH as an interface to export the stored data. While OAI-PMH is well known in the field of repositories, it is rarely known elsewhere (e.g. Google retired its support for OAI-PMH in 2008). The Semantic Web is an a generic approach to publish data on the Internet together with information about its semantics. It's application is not limited to repositories or libraries and it has a growing user group. The W3C released standards like RDF or SPARQL base. RDF or SPARQL are W3C-released standards for publishing structured data on the web in a way computers can easily work withmachine-readable way. The data stored in repositories is particularly suited to be used for use in the Semantic Web, as meta data the metadata is already available. It doesn’t have to be generated or entered manually for publication as Linked Data. For most repositories, at least for Open Access repositories, it is quite important to share their stored content. Linked Data is a rather big chance for repositories to present their content in a way it that can easily be accessed, interlinked and (re)used.

...

We don't want to give a full introduction into the Semantic Web and its technologies here as there this can by be easily found on many places on the web. Nevertheless, we want to give a short glossary about the terms used most often in this context to make the following documentation more readable.

Semantic WebThe term "Semantic Web" refers to the part of the Internet containing Linked Data. As Just like the World Wide Web, the Semantic Web is created also weaved together by links between among the data.

Linked Data

Linked Open Data

Data in RDF, following the Linked Data Principles are called Linked Data. The Linked Data Principles describes describe the expected behavior by of data publishers that who shall ensure that the published data published is easy to find, easy to retrieve, can be linked easily and links to other data as well.

Linked Open Data is Linked Data , published using under an open license. Technically there There is no technical difference between Linked Data and Linked Open Data (often abbreviated as LOD), it is only a question of the license used to publish it.

RDF
RDF/XML
Turtle
N-Triples
N3-Notation
RDF is an acronym for Resource Description Framework, a meta data metadata model. Don't think of RDF as a format, as it is a model. Nevertheless, there are different formats to serialize data following RDF. RDF/XML, Turtle, N-Triples and N3-Notation are probably the most well-known formats to serialize data in RDF. While RDF/XML uses XML, Turtle, N-Triples and N3-Notation don't and they are easier for humans to read and write. When we use RDF in the DSpace configuration files of DSpace, we currently prefer Turtle (but the code should be able to deal with all serializationsany serialization).
Triple StoreA triple store is a database to natively store data following the RDF approachmodel. As Just like you have to provide a relational database for DSpace, you have to provide a Triple Store for DSpace if you want to use the LOD support.
SPARQLThe SPARQL Protocol and RDF Query Language is a family of protocols to query triple stores. Since SPARQL version 1.1 it , SPARQL can be used to manipulate triple stores as well, to store, delete or updata data in triple stores. DSpace uses SPARQL 1.1 Graph Store HTTP Protocoll Protocol and SPARQL 1.1 Query Language to communicate with the Triple Store. The SPARQL 1.1 Query Language is often referred to simply as SPARQL, so expect the SPARQL 1.1 Query Language it if no other specific protocol out of the SPARQL familliy family is explicitly specified explicitly.
SPARQL endpointA SPARQL endpoint is an a SPARQL interface of a triple store. Since SPARQL 1.1, a SPARQL endpoint can be either read-only, allowing only to query the stored data only or it can be read-writable allowing to modified ; or readable and writable, allowing to modify the stored data as well. If When talking about a SPARQL endpoint without specifying which SPARQL protocol is used, a an endpoint supporting SPARQL 1.1 Query Language is meant.

Linked (Open) Data Support within DSpace

Starting with DSpace 5.0 DSpace supports to provide stored contents as , DSpace provides support for publishing stored contents in form of Linked (Open) Data.

Architecture / Concept

To publish content stored in DSpace as Linked (Open) Data, the data has to be converted into RDF. The conversion into RDF has to be configurable as different DSpace instances may uses use different meta data metadata schemata, different persistent identifiers (DOI, Handle, ...) and so on. Depending on the content to convert, the configuration and other parameters the conversion , conversion may be time-intensive and impact performance intensive. Contents of repositories is much more often read then created, deleted or changed as because the main target goal of repositories is to safely store their contents. For this reasons reason, the content stored within DSpace will be converted directly is converted and stored in a triple store immediately after it was is created or updated and the converted data is stored in a triple store. The triple store serves as a cache and provides a SPARQL endpoint to make the converted data accessible using SPARQL. The conversion is triggered automatically by the DSpace event system and can be started manually using a the command line interface (both cases are documented below). There is no need to backup the tripel triple store, as all data stored in the triple store can be restored out of recreated from the contents stored elsewhere in DSpace else-where (in the assetstore(s) and the database). Beside the SPARQL endpoint, the data should be published as RDF serialization as well. With dspace-rdf DSpace offers a module that loads converted data from the triple store and provides it as an RDF serialization (it currently supports RDF/XML, Turtle and N-Triples).

Repositories use Persistent Identifiers to make content citable and to address contents. Following the Linked Data Principles, DSpace uses a Persistent Identifier in the form of HTTP(S) - URIs, converting a handle to http://hdhdl.handle.net/<handle> and a DOI to http://dx.doi.org/<doi>. Bringing it all together the Altogether, DSpace Linked Data support of DSpace extends spans all three Layers: the storage layer with a triple store, the business logic with classes to convert stored contents into RDF and the application layer with a module to publish RDF serializations. As you can use DSpace with Just like DSpace allows you to choose Oracle or Postgresql as the relational database, you may choose between different triple stores. The only requirements are that the triple store must support SPARQL 1.1 Query Language and SPARQL 1.1 Graph Store HTTP Protocol as which DSpace uses them to store, update, delete and load converted data in/out of the triple store and uses the triple store to provide the data over a SPARQL endpoint.

Warning
titleStore public data only in the triple store!

The triple store should contain only data that is public as , because the DSpace access restriction of DSpace restrictions won't affect the SPARQL endpoint. For this reason, DSpace converts only archived, discoverable (non-private) Items, Collections and Communities that which are readable for anonymous users. Please consider this while configuring and/or extending DSpace 's Linked Data support.

The package org.dspace.rdf.conversion package contains the classes used to convert the repository 's content to RDF. The conversion itself is done by plugins. The interface org.dspace.rdf.conversion.ConverterPlugin interface is really simple, so take a look if at it you if can program in Java and want to extend the conversion. The only thing important is , that plugins must only create RDF that can be made publicly available, as the triple store provides it using a sparql endpoint for which the DSpace 's access restrtictions restrictions do not apply. Plugins converting meta data metadata should check whether as a specific meta data metadata field needs to be protected or not (see org.dspace.app.util.MetadataExposure on how to check that). The MetadataConverterPlugin is heavily configurable (see below) and is used to convert metadata of Items. The StaticDSOConverterPlugin can be used to add static RDF Triple (see below). The SimpleDSORelationsConverterPlugin creates links between items and collections, collections and communities, subcommunitites and their parents and between top-level communities and the information representing the repository itself.

As different repositories uses different persistent identifiers to address their content, different algorithms to create URIs used within the converted data can be implemented. Currently HTTP(S) - URIs of the repository (called local URIs), handles and DOIs can be used. See the configuration part of this document for further information. If you want to add another algorithm, take a look on at the interface org.dspace.rdf.storage.URIGenerator interface.

Install a Triple Store

In addition to a normal DSpace installation you have to install a triple store. You can use any triple store that supports SPARQL 1.1 Query Language and SPARQL 1.1 Graph Store HTTP Protocol. If you do not have one yet, you can use Apache Fuseki. Download Fuseki from its official download page and unpack the downloaded archive. The archive contains several scripts to start fuseki. Use the start script appropriated for the OS of your choice with the options '--localhost --config=<dspace-install>/config/modules/rdf/fuseki-assembler.ttl'. Instead of changing into the directory you unpacked fuseki to, you may set the variable FUSEKI_HOME. If you're using Linux and bash, unpacked fuseki to /usr/local/jena-fuseki-1.0.1 and installed DSpace to [dspace-install] this would look like this:

...