Project Overview
Collection Description
UNSWorks
- The institutional repository – UNSWorks – contains more than 12,000 objects. These include research publications such as digital theses and conference papers. The UNSWorks Live Fedora includes some metadata-only records as well as objects with file attachments. There is also an Interim Fedora that is used to house publications and metadata (including information about grants) requiring review or processing prior to ingestion to the UNSWorks Live Fedora. The publication metadata is sourced from the Research Outputs System (ROS) and details about UNSW people and grants is obtained from other UNSW enterprise systems via the data warehouse. The Interim Fedora currently contains about 500,000 records.
ResData
- A research data management system containing over 250 records. The records describe datasets and research data management plans plus related parties (i.e. people) and activities (i.e. grants and projects). Information about people, grants and projects is sourced from other institutional databases via the data warehouse.
Other UNSW disciplinary repositories
- Approximately 25,000 records are stored across 5 other specialist disciplinary repositories. While most are metadata-only records, there is also some managed content such as video files.
Fedora 3 Details
Object Models
UNSWorks
Resource
DC
Type: Inline XML
Mime Type: text/xml
Versionable
- MODS = descriptive metadata
- Type: Inline XML
- Mime Type: text/xml
- Versionable
- RELS-EXT
- Type:Inline XML
- Mime Type: application/rdf+xml
- Versionable
- Contains additional informatiion of the object such as persistent identifier (handle)
- RELS-INT
- Type: Inline XML
- Mime Type: application/rdf+xml
- Versionable
- Contains additional information about the datastreams, such as type of resource and relation.
- DP-EVENT = PREMIS preservation metadata
- Type: Inline XML
- Mime Type: application/rdf+xml
- Versionable
- SOURCE
- Type: Managed
- Mime Type: any
- Versionable
- PM = preservation metadata about individual datastream (eg: SOURCE01 would have PM-SOURCE01)
- Type: Inline XML
- Mime Type: application/rdf+xml
- Versionable
ResData
Dataset, Activity (grants/projects), and Party (people) object
DC
Type: Inline XML
Mime Type: text/xml
Versionable
- RELS-EXT
- Type:Inline XML
- Mime Type: application/rdf+xml
- Versionable
- Contains additional informatiion of the object such as persistent identifier (handle/doi) and resource type
- RELS-INT
- Type: Inline XML
- Mime Type: application/rdf+xml
- Versionable
- Contains additional information about the datastreams, such as type of resource, relation, version, and publishing status
- RDF = descriptive metadata plus links to related parties and activities for published object
Type: Inline XML
Mime Type: text/xml
Versionable
- RDFNP = descriptive metadata plus links to related parties and activities for unpublished object
Type: Inline XML
Mime Type: text/xml
Not Versionable
Research Data Management Plan object
DC
Type: Inline XML
Mime Type: text/xml
Versionable
- RELS-EXT
- Type:Inline XML
- Mime Type: application/rdf+xml
- Versionable
- Contains additional informatiion of the object such as persistent identifier (handle/doi) and resource type
- RDFNP = descriptive metadata plus links to related parties and activities for unpublished object
Type: Inline XML
Mime Type: text/xml
Not Versionable
Notes: Record status includes draft, pending, published. Only dataset, activity and party objects can be published (not research data management plans). Published records are versionable = True. Different pid format based on object type (e.g. sample activity object pid = resdataa:2222; sample dataset object pid = resdatac:3333).
Functionality
Storage: Legacy storage (or Akubra)
UNSWorks uses Legacy storage and ResData uses Akubra.
XML metadata : datastreams
See object models above.
XML metadata : inline
See object models above.
Content models
Default Fedora Content Model.
Datastream types (inline, managed, redirect, and external)
Non metadata datastream is a managed datastream.
Identifiers
UNSW uses custom namespaces for PIDs. Some repositories use multiple PID prefixes. All UNSW repositories use handles as persistent identifiers for objects. The ResData repository also uses DOIs for some objects.
Indexing strategies (GSearch, RI-Search vs. F4 approaches)
UNSW uses the Generic Search Service (GSearch) and Resource Index (RISearch) Search.
Replication/Journaling
UNSW does not use replication or journaling.
Security policies: XACML
Default XACML with minor modification for accessing rights metadata on UNSWorks.
OAI-PMH
UNSW does not use the Fedora OAI-PMH module. UNSW uses the Fedora 3 API to export XML metadata and jOAI as the OAI-PMH data provider.
Versions
All datastreams are mostly versionable.
Disseminators
UNSW does not use disseminators.
Audit history
UNSW uses audit history for statistic, preservation, and versioning.
API
Most use Fedora 3 API (REST and SOAP):
API_A
findObjects
getDatastreamDissemination
- listDatastreams
API_M
- Datastream Management
addDatastream
getDatastreams
getDatastreamHistory
getDatastream
modifyDatastreamByValue
modifyDatastreamByReference
setDatastreamState
setDatastreamVersionable
purgeDatastream
Object Management
- modifyObject
- purgeObject
- getNextPID
- ingest
Fedora 4 Details
Fedora 3 to 4 data model mapping
This section outlines how the Fedora 3 objects associated with the UNSW repositories are conceptually mapped to Fedora 4 nodes.
Mapping Fedora 3 Object Properties to Fedora 4:
Fedora 3 | Fedora 4 | Example | Note | |
PID | PID | dc:identifier | resdatac:1 | Legacy Fedora 3 identifier |
State | state | access:objState | Active | Use of fedora:status is preferred but not yet supported. Expected to be addressed by Fedora 4.1.1. |
Label | label | dc:title | Record title | |
Creation Date | CREATED | premis:hasDateCreatedByApplication | 2014-01-20T04:34:26.331Z | premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable. |
Last Modified Date | lastModifiedDate | fedora:lastModified | 2014-01-20T05:39:08.601Z | Date of migration is to be treated as a “modification”. |
Owner Identifier | ownerId | ms21:owner | z2212222 | The creator of the object |
Mapping Fedora 3 Datastream Properties to Fedora 4:
Fedora 3 | Fedora 4 | Example | Note | |
DSID | ID | identifier or dc:identifier | MODS | This is the legacy Fedora 3 datastream identifier |
State | state | access:objState | Active | Use of fedora:status is preferred but not yet supported. Expected to be addressed by Fedora 4.1.1. |
Control Group | CONTROL_GROUP | N/A | X | Migration is deemed unnecessary |
Versionable | VERSIONABLE | fedora:hasVersions | true | The “VERSIONABLE” property of Fedora 3 is not semantically equivalent to Fedora 4’s hasVersions data property. The mapping proposed is intended to enable migration of Fedora 3 data but will no not be used after migration. |
Label | LABEL | dc:title | MODS Metadata | |
Creation Date | CREATED | premis:hasDateCreatedByApplication | 2014-01-20T04:34:26.331Z | Intended to enable migration of Fedora 3 creation dates. premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable. |
Last Modified Date | N/A | fedora:lastModified | 2014-01-20T05:39:08.601Z | Fedora 3 uses “Creation date” for last modified date for datastream. |
Mime Type | MIMETYPE | fedora:mimeType | text/xml | |
Size | SIZE | premis:hasSize | 50000 | Automatically handled by Fedora 4 |
Alternate ID | AltIds | premis:hasOriginalName | sample_file.pdf | Automatically handled by Fedora 4 |
Checksum Type | checksumType | MD5 | SHA1 | Fedora 4 combines checksum type and checksum in one field on fedora:digest property |
Checksum | checksum | fedora:digest | Fedora 3 example: b4df41775c142aa18518d6586a8193c8e0b7dc96
Fedora 4 example: urn:sha1:b4df41775c142aa18518d6586a8193c8e0b7dc96
| Automatically added by Fedora 4 |
Format URI | formatURI | N/A | N/A | This field is not used |
Note: all data and object properties under the official Fedora 4 namespace cannot be modified via Fedora 4 REST API.
Fedora 4 Namespaces
Namespace | URL |
fedora | |
dc |
Fedora 4 data model for ResData
Figure 1 below presents a top level view of the Fedora 4 data model for ResData.
Figure 1: Fedora 4 data model for ResData
ResData Ontology Classes
The ResData Fedora 4 data model is an adaptation of the PCDM model, integrated with a customised version of ANDS VITRO ontology. The resultant ontology consists mainly of the following classes:
Activities, Datasets, Parties (pcdm:Collection)
Activities, Datasets, and Parties are Fedora 4 container nodes of pcdm:Collection type, mainly intended to enable grouping of the three main ResData resource types, i.e. Activity, Dataset and Party. Fedora 4 URI structures for these pcdm:Collection containers are listed below:
Container name | URL |
Activities | /rest/activities |
Datasets | /rest/datasets |
Parties | /rest/parties |
Dataset (VITRO-ANDS:ResearchData, pcdm:Object)
The ResearchData class from the ANDS VITRO ontology is used to define the Dataset resource type in ResData. In the Fedora 4 model for ResData, all instances of the ResearchData class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata, and object properties containing reference to other related ResData resources, such as Activity (vivo:ResearchActivity), Party (foaf:Person) and other Dataset resources. Figure 2 bellow illustrates the combined use of pcdm:Object and VITRO-ANDS:ResearchData classes to represent various ResData resource types.
Figure 2: ResData Dataset resource defined as pcdm:Object
Fedora 4 URI structures for ResData Dataset-related nodes are as below:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
Access | /rest/datasets/[dataset pairtree id]/access |
Licence | /rest/datasets/[dataset pairtree id]/licence |
Methodology | /rest/datasets/[dataset pairtree id]/methodology |
Time Period | /rest/datasets/[dataset pairtree id]/timePeriod |
Retention Period | /rest/datasets/[dataset pairtree id]/retentionPeriod |
Subject | /rest/datasets/[dataset pairtree id]/subject |
Publication | /rest/datasets/[dataset pairtree id]/publication |
GEO | /rest/datasets/[dataset pairtree id]/geo |
Rights | /rest/datasets/[dataset pairtree id]/rights |
Storage | /rest/datasets/[dataset pairtree id]/storage |
ms21:PartyRelation
PartyRelation is a custom class for describing a user-specified relation between a Party and a Dataset. Instances of PartyRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.
Fedora 4 URI structures for the PartyRelation nodes are:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
PartyRelation | /rest/datasets/[dataset pairtree id]/partyRelation1 |
ms21:ResourceRelation
ResourceRelation is a custom class for describing user-defined relationships between Dataset resources. Instances of ResourceRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.
Fedora 4 URI structures for the ResourceRelation nodes are:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
ResourceRelation | /rest/datasets/[dataset pairtree id]/resourceRelation1 |
Activity (vivo:ResearchActivity, pcdm:Object)
The ResearchActivity class from the VIVO ontology is used to define Activity type resources in ResData. In the Fedora 4 model for ResData, all instances of the ResearchActivity class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata and object properties containing reference to additional information about a research project, including funding body and affiliation. Figure 3 bellow illustrates how pcdm:Object and vivo:ResearchActivity classes are combined to represent Activity-type resources in ResData Fedora 4 model.
Figure 3: Activity-type resources in Fedora 4 model for ResData
Fedora 4 URI patterns for ResData Activity-type resources are:
Description | URL |
Activity | /rest/activities/[activity pairtree id] |
Funding | /rest/activities/[activity pairtree id]/funding |
Organisation | /rest/activities/[activity pairtree id]/organisation |
Party (foaf:Person, pcdm:Object)
Similar to Dataset and Activity, all Party-type resources are defined as instances of both the Person class from the FOAF ontology and the pcdm:Object class (Figure 4).
Figure 4: ResData Party defined as pcdm:Object
Fedora 4 URI patterns for ResData Party-type resources:
Description | URL |
Activity | /rest/parties/[party pairtree id] |
Funding | /rest/parties/[party pairtree id]/organisation |
ResData Namespaces
Namespace | URL |
bibo | |
owl | |
ms21 | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
VITRO-ANDS | |
core | |
foaf | |
pcdm |
UNSWorks Data Model
Note: All classes are derived from existing classes used on Fedora 3 used in RELS-INT and RELS-EXT
Classes
unsworksp:collection
Collection is a class describing a group of records. Aside from descriptive metadata, it contains administrative metadata containing access information to the records belonging to the collection.
Property | Sub-property of | Range | Note |
unsworksp:hasCollection |
| unsworksp:collection |
unsworksp:record
A record class individual represents an intellectual entity such as a thesis, a book, moving image, etc. It has descriptive metadata in Dublin Core and administrative metadata. it can have a link to other individual such as metadata, rights, and resource.
Property | Sub-property of | Range | Note |
unsworksp:hasMetadata |
| unsworksp:metadata | |
unsworksp:hasRights |
| unsworksp:rights | |
unsworksp:hasResource |
| unsworks:resource |
unsworksp:resource
A resource class individual represents the electronic resource of the record such as a PDF file of a thesis. It is stored as binary data and it can link to another resource describing the record has another binary data in another format type for preservation purpose. For example: a thesis record has binary file in word document and there is another binary file in PDF format which is converted from the word document.
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
unsworksp:metadata
Metadata class is a class describing a metadata of a record. It is used to represent other record metadata not in Dublin Core format which will be stored as binary data. Similar to resource, it can link to same type another metadata for preservation purpose
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
unsworksp:rights
Rights class individual represent a licence or agreements that author of the electronic resource has signed. Similar to resource, it can link to same type another metadata for preservation purpose
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
Descriptive and Administrative Metadata
Similar to ResData, UNSWorks also uses RELS-INT and RELS-EXT to describe additional information on the Fedora 3 object and datastream for storing administrative information and searching purpose. For example doi and handle.
In Fedora 4, the RELS-INT and RELS-EXT is mapped as resource property of the resource as a administrative metadata.
Below is the RELS-INT and RELS-EXT information that will be ported to Fedora 4 as part of Resource property:
Property | Sub-property of | Range | Note |
unsworksp:resourceType |
|
| |
unsworksp:dunsworkspid |
|
| |
unsworks:embargodate |
|
| |
unsworks:embargoRemoved |
|
| |
owl:SameAs | Alternate URL |
For descriptive metadata, the format for each of Fedora 4 resource is a Dublin Core metadata format.
Namespace
Namespace | URL |
unsworks | |
unsworksp | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
owl |
Sample URL structure on Fedora 4
Based on the model above, each resource can be added on the root using Fedora 4 default ingest using PairTree. The binary file of that particular resource will be added with the resource node as the parent using PairTree.
For example:
Type | unsworksp:record |
---|---|
URL | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Identifier | e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Parent | http://localhost:8080/fcrepo-webapp-4.1.0/rest |
Type | unsworksp:resource |
---|---|
URL | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308/1f/fa/ef/05/1ffaef05-ad57-46b6-a553-08566680cfc2 |
Identifier | 1f/fa/ef/05/1ffaef05-ad57-46b6-a553-08566680cfc2 |
Parent | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Functionality
Storage: Legacy storage (or Akubra)
XML metadata : datastreams
XML metadata : inline
The inline XML metadata is a metadata of the resource. It is mapped as property of a fedora:container.
See Data Model
Content models
Datastream types (inline, managed, redirect, and external)
Identifiers
Indexing strategies (GSearch, RI-Search vs. F4 approaches)
Integrate Fedora 4 with external triple store using JMS Message Consumer to accommodate search with SPARQL.
Replication/Journaling
N/A
Security policies: XACML
OAI-PMH
Versions
Disseminators
Audit history
API