Project Overview
<Insert description here>
Table of Contents |
---|
Collection Description
...
- The institutional repository – UNSWorks – contains more than 12,000 objects. These include research publications such as digital theses and conference papers. It The UNSWorks Live Fedora includes some metadata-only records as well as objects with file attachments. New records are sourced via the university publications management system (ROS). . There is also an Interim Fedora that is used to house publications and metadata (including information about grants) requiring review or processing prior to ingestion to the UNSWorks Live Fedora. The publication metadata is sourced from the Research Outputs System (ROS) and details about UNSW people and grants is obtained from other UNSW enterprise systems via the data warehouse. The Interim Fedora currently contains about 500,000 records.
ResData
- A research data management system containing over 250 records. The records describe datasets and research data management plans plus related parties (i.e. people) and activities (i.e. grants and projects). Information about people, grants and projects is sourced from other institutional databases via the data warehouse.
...
Object Management
- modifyObject
- purgeObject
- getNextPID
- ingest
Fedora 4 Details
Models
Fedora 3
...
to 4 data model mapping
This section outlines how the Fedora 3 objects associated with the UNSW repositories are conceptually mapped to Fedora 4 nodes.
Object and Datastream properties
Mapping Fedora 3 Object Properties to Fedora 4
Fedora 3 | Fedora 4 | Example | Note | ||||||||||
PID | PID | dc:identifier | someprefixresdatac:1 | Legacy Fedora 3 Legacy PIDidentifier | |||||||||
State | state | fedoraaccess:statusobjState | Inactive | Active | Using solution as described in Fedora 4.1.1 addresses the issue on updating status
| ||||||||
Label | label | dc:title | Some Record title | ||||||||||
Creation Date | createdDateCREATED | fedorapremis:createdhasDateCreatedByApplication | 2014-01-20T04:34:26.331Z | Automatically added by Fedora 4 | premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable. | ||||||||
Last Modified Date | lastModifiedDate | fedora:lastModified | Last Modified Date | lastModifiedDate | fedora:lastModified | 2014-01-20T05:39:08.601Z | Automatically added by Fedora 4Date of migration is to be treated as a “modification”. | ||||||
Owner Identifier | ownerId | ms21:owner | 2222222 | UNSW custom property on resource |
z2212222 | The creator of the object |
Mapping Fedora 3 Datastream
...
Properties to Fedora 4
...
Fedora 3 | Fedora 4 | Example | Note | |||||||||
DSID | ID | identifier or dc:identifier | MODS | This is the legacy Fedora 3 Legacy DSIDdatastream identifier | ||||||||
State | state | fedoraaccess:statusobjState | ActiveFedora 4.1.1 addresses the issue on updating status | Using solution as described on
| ||||||||
Control Group | CONTROL_GROUP | N/A | XThis field will not be used anymore | Migration is deemed unnecessary | ||||||||
Versionable | VERSIONABLE | fedora:hasVersions | trueUse Fedora 4 method for creating version, refer to version section | The “VERSIONABLE” property of Fedora 3 is not semantically equivalent to Fedora 4’s hasVersions data property. The mapping proposed is intended to enable migration of Fedora 3 data but will no not be used after migration. | ||||||||
Label | LABEL | dc:title | MODS Metadata | |||||||||
Creation Date | CREATED | fedorapremis:createdhasDateCreatedByApplication | 2014-01-20T04:34:26.331Z | Automatically added by Fedora 4Intended to enable migration of Fedora 3 creation dates. premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable. | ||||||||
Last Modified Date | N/A | fedora:lastModified | 2014-01-20T05:39:08.601Z | Automatically added by Fedora 4Fedora 3 uses “Creation date” for last modified date for datastream. | ||||||||
Mime Type | MIMETYPE | fedora:mimeType | text/xml | Automatically added by Fedora 4 | ||||||||
Size | SIZE | premis:hasSize | 50000 | Automatically added Automatically handled by Fedora 4 | ||||||||
Alternate ID | AltIds | premis:hasOriginalName | sample_file.pdf | Automatically added Automatically handled by Fedora 4 |
Note: all properties with fedora namespace are not user-modifiable.
Namespace
Namespace | URL |
fedora | |
dc | |
ms21 | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
Data Model
ResData
Classes
ms21: UNSW_ResearchDataCollection
A ResearchData class individual represents an entity describing about dataset. It has a descriptive metadata and it must have a link to an instance of ResearchActivity and Person. A link can be established to another ResearchData for describing a related ResearchData
Property | Sub-property of | Range | Note |
owl:SameAs | - | - | - |
ms21:relatedDataset |
|
|
|
ms21:principalInvestigator |
| foaf:Person |
|
ms21:contributor |
| foaf:Person |
|
ms21:hasGrant |
|
|
|
ms21:hasAward |
|
|
|
ms21:hasActivity |
|
|
|
ms21:ResearchDataManagementPlan
ResearchDataManagementPlan is a class describing a dataset plan. Similar to ResearchData class, it must have a link to an instance of ResearchActivity and Person.
Property | Sub-property of | Range | Note |
ms21:principalInvestigator |
| foaf:Person |
|
ms21:researchManager |
| foaf:Person |
|
ms21:reader |
| foaf:Person |
|
ms21:contributor |
| foaf:Person |
|
ms21:hasGrant |
|
|
|
ms21:hasAward |
|
|
|
vivo:hasActivity |
|
|
|
vivo:ResearchActivity
...
foaf:Person
Person is a class describing a person.
Descriptive and Administrative Metadata
ResData uses RELS-INT and RELS-EXT to describe additional information on the Fedora 3 object and datastream for storing administrative information and searching purpose. For example status, published date, embargo date, etc.
In Fedora 4, the RELS-INT and RELS-EXT is mapped as resource property of the resource as a administrative metadata.
Below is the RELS-INT and RELS-EXT information that will be ported to Fedora 4 as part of Resource property:
Property | Sub-property of | Range | Note |
ms21:datePublished |
|
|
|
ms21:status |
|
|
|
bibo:doi |
|
|
|
ms21:handle |
| ||
ms21:storageNamespace |
|
| |
ms21:storageStatus |
|
| |
owl:SameAs | Alternate URL |
For descriptive metadata, the format for each of Fedora 4 resource is a Dublin Core metadata format.
Namespace
Namespace | URL |
bibo | |
owl | |
ms21 | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
VITRO-ANDS | |
core | |
foaf |
(TODO: Range and example)
Sample URL structure on Fedora 4
Based on the model above, each resource can be added on the root using Fedora 4 default ingest using PairTree.
For example:
Type | foaf:Person |
---|---|
URL | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Identifier | e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Checksum Type | checksumType | MD5 | SHA1 | Fedora 4 combines checksum type and checksum in one field on fedora:digest property |
Checksum | checksum | fedora:digest | Fedora 3 example: b4df41775c142aa18518d6586a8193c8e0b7dc96
Fedora 4 example: urn:sha1:b4df41775c142aa18518d6586a8193c8e0b7dc96
| Automatically added by Fedora 4 |
Format URI | formatURI | N/A | N/A | This field is not used |
Note: all data and object properties under the official Fedora 4 namespace cannot be modified via Fedora 4 REST API.
Objects and Datastreams Namespaces
Namespace | URL |
fedora | |
dc | |
access | http://fedora.info/definitions/1/0/access/ |
premis | http://www.loc.gov/premis/rdf/v1# |
ResData Dataset Data Model
Figure 1 below presents a top level view of the Fedora 4 data model for ResData Dataset
Figure 1: Fedora 4 data model for ResData
Classes
The ResData Fedora 4 data model is an adaptation of the PCDM model, integrated with a customised version of ANDS VITRO ontology. The resultant ontology consists mainly of the following classes:
Activities, Datasets, Parties (pcdm:Collection)
Activities, Datasets, and Parties are Fedora 4 container nodes of pcdm:Collection type, mainly intended to enable grouping of the three main ResData resource types, i.e. Activity, Dataset and Party. Fedora 4 URI structures for these pcdm:Collection containers are listed below:
Container name | URL |
Activities | /rest/activities |
Datasets | /rest/datasets |
Parties | /rest/parties |
Dataset (VITRO-ANDS:ResearchData, pcdm:Object)
The ResearchData class from the ANDS VITRO ontology is used to define the Dataset resource type in ResData. In the Fedora 4 model for ResData, all instances of the ResearchData class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata, and object properties containing reference to other related ResData resources, such as Activity (vivo:ResearchActivity), Party (foaf:Person) and other Dataset resources. Figure 2 bellow illustrates the combined use of pcdm:Object and VITRO-ANDS:ResearchData classes to represent various ResData resource types.
Figure 2: ResData Dataset resource defined as pcdm:Object
Fedora 4 URI structures for ResData Dataset-related nodes are as below:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
Access | /rest/datasets/[dataset pairtree id]/access |
Licence | /rest/datasets/[dataset pairtree id]/licence |
Methodology | /rest/datasets/[dataset pairtree id]/methodology |
Time Period | /rest/datasets/[dataset pairtree id]/timePeriod |
Retention Period | /rest/datasets/[dataset pairtree id]/retentionPeriod |
Subject | /rest/datasets/[dataset pairtree id]/subject |
Publication | /rest/datasets/[dataset pairtree id]/publication |
GEO | /rest/datasets/[dataset pairtree id]/geo |
Rights | /rest/datasets/[dataset pairtree id]/rights |
Storage | /rest/datasets/[dataset pairtree id]/storage |
ms21:PartyRelation
PartyRelation is a custom class for describing a user-specified relation between a Party and a Dataset. Instances of PartyRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.
Fedora 4 URI structures for the PartyRelation nodes are:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
PartyRelation | /rest/datasets/[dataset pairtree id]/[partyRelation id1] |
ms21:ResourceRelation
ResourceRelation is a custom class for describing user-defined relationships between Dataset resources. Instances of ResourceRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.
Fedora 4 URI structures for the ResourceRelation nodes are:
Description | URL |
Dataset | /rest/datasets/[dataset pairtree id] |
ResourceRelation | /rest/datasets/[dataset pairtree id]/[resourceRelation id1] |
Activity (vivo:ResearchActivity, pcdm:Object)
The ResearchActivity class from the VIVO ontology is used to define Activity type resources in ResData. In the Fedora 4 model for ResData, all instances of the ResearchActivity class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata and object properties containing reference to additional information about a research project, including funding body and affiliation. Figure 3 bellow illustrates how pcdm:Object and vivo:ResearchActivity classes are combined to represent Activity-type resources in ResData Fedora 4 model.
Figure 3: Activity-type resources in Fedora 4 model for ResData
Fedora 4 URI patterns for ResData Activity-type resources are:
Description | URL |
Activity | /rest/activities/[activity pairtree id] |
Funding | /rest/activities/[activity pairtree id]/funding |
Organisation | /rest/activities/[activity pairtree id]/organisation |
Party (foaf:Person, pcdm:Object)
Similar to Dataset and Activity, all Party-type resources are defined as instances of both the Person class from the FOAF ontology and the pcdm:Object class (Figure 4).
Figure 4: ResData Party defined as pcdm:Object
Fedora 4 URI patterns for ResData Party-type resources:
Description | URL |
Activity | /rest/parties/[party pairtree id] |
Funding | /rest/parties/[party pairtree id]/organisation |
Descriptive and Administrative Metadata
In Fedora 4, the RELS-INT and RELS-EXT information associated with the ResData resources will be migrated as data properties of the corresponding Fedora 4 resource nodes.
Below is the RELS-INT and RELS-EXT information that will be migrated to Fedora 4 as data properties of Dataset, Party, and Activity resources in ResData:
Property | Note |
---|---|
VITRO-ANDS: dateOfPublication | |
ms21:status | |
bibo:doi | only for Dataset |
ms21:handle | only for Dataset |
Namespaces
Namespace | URL |
bibo | |
owl | |
ms21 | |
VITRO-ANDS | |
core | |
foaf | |
pcdm |
UNSWorks Data Model
Figure 5: UNSWorks Data Model
Note: All classes are derived from existing UNSWorks objects including the RELS-INT and RELS-EXT information in Fedora 3.
Classes
unsworksp:access (pcdm:Object)
Access is a class describing a set of authorised users and/or groups. It is used to detail the access constraints placed on a record or collection. All of access classes are also defined pcdm:Object type node in the Fedora 4 UNSWorks model.
unsworksp:collection (pcdm:Object)
Collection is a class describing a group of records. Collection includes descriptive metadata with a link to access information. Similar to access class, instances of collections in the UNSWorks Fedora 4 model are also defined as pcdm:Object type node.
Property | Note |
unsworksp:hasAccessConstraint |
unsworksp:record
A record class represents a container for an intellectual entity such as a thesis, a book, moving image, etc. it has descriptive metadata about the record and can contain other metadata and/or binary files and rights associated with the record as described on the next classes. Similar to collection and access classes, all instances of UNSWorks records are also defined as pcdm:Object type node.
Property | Note |
unsworksp:hasMetadata | |
unsworksp:hasRights | |
unsworksp:hasResource | |
unsworksp:hasCollection | |
unsworksp:hasAccessConstraint |
unsworksp:resource (pcdm:File)
A resource class individual represents a electronic resource of the record such as a PDF file of a thesis and it is stored as binary data. Additionally this class is used to represent a converted file for preservation purpose. For example, a thesis in MS Word document format will have a preservation copy in PDF format. The relationship between these files will be represented by the unsworksp:migratedFrom property. All resources are also defined as a pcdm:File type node.
Property | Note |
unsworksp:migratedFrom |
unsworksp:metadata (pcdm:File)
The metadata class is used to represent additional descriptive metadata of a record that cannot be added as properties in record class, for example: MODS and MARCXML descriptive metadata. These type of metadata will be stored as binary file (pcdm:File). Similar to the resource class, it may have a converted version for preservation purpose.
Property | Note |
unsworksp:migratedFrom |
unsworksp:rights (pcdm:File)
The rights class is used to represent a licence or agreements signed by the person submitting the resource. Similar to resource and metadata, it has a link to its converted version for preservation purpose and it is also defined as a pcdm:File type node.
Property | Note |
unsworksp:migratedFrom |
Descriptive and Administrative Metadata
In Fedora 4, the RELS-INT and RELS-EXT for UNSWorks are mapped as data properties of the resource node.
Below is the RELS-INT and RELS-EXT information that will be ported to Fedora 4:
Property | Note |
---|---|
unsworksp:resourceType |
|
unsworksp:dunsworkspid |
|
unsworks:embargodate |
|
unsworks:embargoRemoved |
|
owl:SameAs | Alternate URL |
For descriptive metadata, the format for each of Fedora 4 resource is a Dublin Core metadata format.
Namespace
Namespace | URL |
unsworks | |
unsworksp | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
owl |
Sample URL structure on Fedora 4
Based on the model above, each resource can be added on the root using Fedora 4 default ingest using PairTree. The binary file of that particular resource will be added with the resource node as the parent using PairTree. Below are Fedora 4 URI patterns for UNSWorks nodes. Collections, records and accesses are pdcm:object act as acontainer for collection, record, and access.
Description | URL |
Collection | /rest/collections/[collection pairtree id] |
Record | /rest/records/[record pairtree id] |
Resource | /rest/records/[record pairtree id]/[resource id1] |
Metadata | /rest/records/record/[record pairtree id]/[metadata id1] /rest/records/record/[record pairtree id]/[metadata id2] /rest/records/record/[record pairtree id]/[metadata id3] |
Rights | /rest/records/record/[record pairtree id]/[rights id1] /rest/records/record/[record pairtree id]/[rights id2] /rest/records/record/[record pairtree id]/[rights id3] |
Access | /rest/access/[access pairtree id] |
Functionality
Storage: Legacy storage (or Akubra)
Fedora 4 REST API will be used to Fedora 3 to Fedora 4. There are no issues related to the storage type for migration. The only difference is that container node is stored in database. On Fedora 3, object and datastream are stored in file structure.
XML metadata : datastreams
Where possible, metadata will be stored as properties of the relevant node. Metadata in other formats such as XML (e.g. MODS), will be stored as a binary file (pcdm:File).
XML metadata : inline
The inline XML metadata is a descriptive metadata of the resource. It is mapped as property of Fedora 4 container node (pdcm:Object).
See Data Model above for more information.
Content models
The default Fedora content models have not been modified.
Datastream types (inline, managed, redirect, and external)
In Fedora 3, the UNSWorks and ResData repositories only uses inline and managed datastreams. Inline datastreams is used for descriptive metadata such as DC, RDF, MODS, and MARCXML . DC and RDF metadata can be mapped to properties of Fedora 4 container node, others will be stored as binary file as Fedora 4 binary node. Similarly for managed datastreams, all will be stored as Fedora binary node (pdcm:File). See the UNSWorks and ResData Data Models for more information.
Identifiers
The PairTree algorithm is the default method for generating identifiers in Fedora 4. This method will be used for the migration and for new object to address the performance issue about limiting the number of children under a single resource (Performance). As for the legacy PID, it will be stored as a property of the node as mentioned above.Refer to the URL structures on Data Model section for example.
Indexing strategies (GSearch, RI-Search vs. F4 approaches)
Integrate Fedora 4 with external triple store using JMS Message Consumer to accommodate search with SPARQL.
For installation, refer to:
https://wiki.duraspace.org/display/FEDORA41/External+Triplestore
Replication/Journaling
N/A
Security policies: XACML
Security policies will be initially handled by the client applications. WebACL and the Fedora 4 Access Roles module will be explored further in future.
OAI-PMH
Fedora 4 OAI-PMH Provider will be used. Refer to the information on this link for installation:
https://wiki.duraspace.org/display/FEDORA41/Setup+OAI-PMH+Provider
Further testing will be done to test for OAI-PMH status.
Versions
Fedora 4 versioning will be used to store Fedora 3 versions. This will be included on the migration script later.
Disseminators
N/A
Audit history
For migration purposes, the legacy Fedora 3 FOXML will be stored as fedora:Binary (pcdm:File) in Fedora 4. The Fedora 4 Audit module will be used to manage the audit history after further testing.
API
Fedora 4 REST API will be used to replaced Fedora 3 SOAP and REST API
APPENDIX
Fedora 4 ResData Dataset N3:
Fedora 4 ResData Party N3:
Fedora 4 ResData Activity N3:
Fedora 3 ResData RELS-EXT Example for Dataset:
<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" |
Fedora 3 ResData RELS-INT Example for Dataset:
<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" <rdf:Description rdf:about="info:fedora/resdatac:103/RDF.1"> <rdf:Description rdf:about="info:fedora/resdatac:103/RDF.2"> <rdf:Description rdf:about="info:fedora/resdatac:103/RDF.3"> |
Fedora 3 ResData RELS-EXT Example for Plan:
<rdf:RDF xmlns:fedora="info:fedora/fedora-system:def/relations-external#" |
Fedora 3 UNSWorks RELS-EXT Example:
<?xml version="1.0" encoding="UTF-8"?> |
Fedora 3 UNSWorks RELS-INT Example:
<?xml version="1.0" encoding="UTF-8"?> |
UNSWorks Data Model Example:
UNSWorks
Classes
unsworksp:collection
Collection is a class describing a group of records. Aside from descriptive metadata, it contains administrative metadata containing access information to the records belonging to the collection.
Property | Sub-property of | Range | Note |
unsworksp:hasCollection |
| unsworksp:collection |
unsworksp:record
A record class individual represents an intellectual entity such as a thesis, a book, moving image, etc. It has descriptive metadata in Dublin Core and administrative metadata. it can have a link to other individual such as metadata, rights, and resource.
Property | Sub-property of | Range | Note |
unsworksp:hasMetadata |
| unsworksp:metadata | |
unsworksp:hasRights |
| unsworksp:rights | |
unsworksp:hasResource |
| unsworks:resource |
unsworksp:resource
A resource class individual represents the electronic resource of the record such as a PDF file of a thesis. It is stored as binary data and it can link to another resource describing the record has another binary data in another format type for preservation purpose. For example: a thesis record has binary file in word document and there is another binary file in PDF format which is converted from the word document.
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
unsworksp:metadata
Metadata class is a class describing a metadata of a record. It is used to represent other record metadata not in Dublin Core format which will be stored as binary data. Similar to resource, it can link to same type another metadata for preservation purpose
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
unsworksp:rights
Rights class individual represent a licence or agreements that author of the electronic resource has signed. Similar to resource, it can link to same type another metadata for preservation purpose
Property | Sub-property of | Range | Note |
unsworksp:migratedFrom |
|
|
Descriptive and Administrative Metadata
Similar to ResData, UNSWorks also uses RELS-INT and RELS-EXT to describe additional information on the Fedora 3 object and datastream for storing administrative information and searching purpose. For example doi and handle.
In Fedora 4, the RELS-INT and RELS-EXT is mapped as resource property of the resource as a administrative metadata.
Below is the RELS-INT and RELS-EXT information that will be ported to Fedora 4 as part of Resource property:
Property | Sub-property of | Range | Note |
unsworksp:resourceType |
|
| |
unsworksp:dunsworkspid |
|
| |
unsworks:embargodate |
|
| |
unsworks:embargoRemoved |
|
| |
owl:SameAs | Alternate URL |
For descriptive metadata, the format for each of Fedora 4 resource is a Dublin Core metadata format.
Namespace
Namespace | URL |
unsworks | |
unsworksp | http://www.unsworks.unsw.edu.au/ontology/preservation-metadata/ |
owl |
(TODO: Range and example)
Sample URL structure on Fedora 4
Based on the model above, each resource can be added on the root using Fedora 4 default ingest using PairTree. The binary file of that particular resource will be added with the resource node as the parent using PairTree.
...
Type | unsworksp:record |
---|---|
URL | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Identifier | e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Parent | http://localhost:8080/fcrepo-webapp-4.1.0/rest |
Type | unsworksp:resource |
---|---|
URL | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308/1f/fa/ef/05/1ffaef05-ad57-46b6-a553-08566680cfc2 |
Identifier | 1f/fa/ef/05/1ffaef05-ad57-46b6-a553-08566680cfc2 |
Parent | http://localhost:8080/fcrepo-webapp-4.1.0/rest/e3/93/78/f1/e39378f1-dc42-40d9-9199-545ff5860308 |
Functionality
Storage: Legacy storage (or Akubra)
XML metadata : datastreams
XML metadata : inline
Content models
Datastream types (inline, managed, redirect, and external)
Identifiers
Indexing strategies (GSearch, RI-Search vs. F4 approaches)
Replication/Journaling
Security policies: XACML
OAI-PMH
Versions
Disseminators
Audit history
API