Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The DSpace working group has users have expressed the need for DSpace to be able to provide more support for different types of digital objects related to open access publications, such as authors/author profiles, data sets etc.
The objectives of the working group were to:

  • Create awareness on the topic of additional entities
  • Propose a solution
  • Specify the global architecture and requirements
  • Propose a roadmap that was not tied to a particular version of DSpace

The working group has decided the following design principles should be applied:

  • We would avoid hard-coding a particular object model, for example the code shouldn’t contain specific Java classes for specific objects such as Person, Project, Dataset, etc
  • The Implementation of the data model should be configurable so that any institution can define which additional objects would require more detailed/elaborate support in DSpace and which objects can be represented by the generic default Item DSpace object.
  • The design will start from the DSpace Item object and extend it.
  • And in order to provide more specific support for particular entities, items can be typed and relations between typed items can be defined through configuration

Configurable Entities are designed to meet that need.

In DSpace, an Entity is a special type of Item which often has Relationships to other Entities.  Breaking it down with more details...

  • Entity: Every Entity is an Item.
    • This means they must belong to a Collection, just like a normal Item. (Community & Collection objects are unchanged and unaffected by Entities.)
    • Normal Items are still the "default" Item, and they are unchanged.  So, not every Item is an Entity.
    • Because Entities are all Items, they are immediately usable in submission/workflow process, batch import/export, OAI-PMH, etc.
  • Entity (or Item) Type: Entities all have a "dspace.entity.type" metadata field which defines their Entity/Item "type".  For example, this type may be "Person", "Project", "Publication", "Journal", etc.  It's highly visible within the User Interface as a label.
  • Relationships: Based on that "type", an Entity may be related to other Entities via a Relationship.  One Entity type may support several relationship types at once.  Examples of relationship types include "isPersonOfProject" or "isPublicationOfAuthor".  These relationship types are named based on the Entity "type" (as you can likely tell).  Relationships also appear on Entities as metadata using the "relation" schema.
  • Virtual Metadata: Entities of different types may also have customized visualizations in the User Interface.  These visualizations may also dynamically pull in metadata from related Entities.  For example, a Publication entity may be displayed in the User Interface with an author name dynamically pulled in from a related Person entity.  The metadata "appears" as though it is part of the Entity you are viewing, but it is dynamically pulled via the Relationship.

Entities and their Relationships are also completely configurable. DSpace provides some sample models out of the box, which you can use directly or adapt as needed.

The Entity model also For this implementation, the current DSpace Item object has been used to avoid hardcoding a particular object model. This is in order to be able to have a low barrier to use all the existing functionality in DSpace that already supports the item object for any new entities, such as the submission/workflow process, search, batch import, OAI-PMH, etc.
In order to support multiple objects, the objects are items and can be typed, and relations can be created between items.
The model has similarities with the Portland Common Data Model (PCDM), but is less focused on building a tree structure, and with an Entity roughly mapping to a "pcdm:Object" and existing Communities and Collections roughly mapping to a "pcdm:Collection".  However, at this time DSpace Entities concentrate more on building a graph structure as is required for certain object models such as a CRIS object model. Where PCDM is focused on a tree structure with the related object as an addition to include some basic graph-like functionality, this object model starts immediately from the graph concept.
The relations are stored in a separate database table, which is the only change needed to the current DSpace database structure, using foreign keys to the item table to ensure the relations are not broken, and to avoid any data replication.
The implementation of a specific object model is completely configurable, without using specific hardcoded Java classes for the objects used in a specific object model.

...

of relationships, instead of a tree structure.  

Configuration of Entity types and their relations

There are 2 approaches possible here:

...