Page History
The DSpace AIP Format
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Makeup and Definition of AIPs
...
- AIP is a package describing one archival object in DSpace.
- The archival object may be a single Item, Collection, Community, or Site (Site AIPs contain site-wide information). Bitstreams are included in an Item's AIP.
- Each AIP is logically self-contained, can be restored without rest of the archive. (So you could restore a single Item, Collection or Community)
- Collection or Community AIPs do not include all child objects (e.g. Items in those Collections or Communities), as each AIP only describes one object. However, these container AIPs do contain references (links) to all child objects. These references can be used by DSpace to automatically restore all referenced AIPs when restoring a Collection or Community.
- AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This means that in-progress, uncompleted submissions are not described in AIPs and cannot be restored after a disaster. Permanently removed objects will also no longer be exported as AIPs after their removal. However, withdrawn objects will continue to be exported as AIPs, since they are still considered under the "in archive" status.
- AIPs with identical contents will always have identical checksums. This provides a basic means of validating whether the contents within an AIP have changed. For example, if a Collection's AIP has the same checksum at two different points in time, it means that Collection has not changed during that time period.
- AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object. When possible, an AIP tries to use AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object. When possible, an AIP tries to use common standards to express objects.
- An AIP can serve as a DIP (Dissemination Information Package) or SIP (Submission Information Package), especially when transferring custody of objects to another DSpace implementation.
- In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative metadata, and basic provenance information. AIPs
- also describe some basic system level information (e.g. Groups and People).
General AIP Structure / Examples
...
- Site AIP (Sample: SITE-example.zip)
- METS contains basic metadata about DSpace Site and persistent IDs referencing all Top Level Communities
- METS also contains a list of all Groups and EPeople information defined in the DSpace system. (NOTE: By default, user passwords are not stored in AIPs, unless you specify the 'passwords' flag. See Additional Packager Options.)
- Community AIP (Sample: COMMUNITY@123456789-1.zip)
- METS contains all metadata for Community and persistent IDs referencing all members (SubCommunities or Collections). Package may also include a Logo file, if one exists.
- METS contains any Group information for Commmunity-specific groups (e.g.
COMMUNITY_<ID>_ADMIN
group). - METS contains all Community permissions/policies (translated into METSRights schema)
- Collection AIP (Sample: COLLECTION@123456789-2.zip)
- METS contains all metadata for Collection and persistent IDs referencing all members (Items). Package may also include a Logo file, if one exists.
- METS contains any Group information for Collection-specific groups (e.g.
COLLECTION_<ID>_ADMIN
,COLLECTION_<ID>_SUBMIT
, etc.). - METS contains all Collection permissions/policies (translated into METSRights schema)
- If the Collection has an Item Template, the METS will also contain all the metadata for that Item Template.
- Item AIP (Sample: ITEM@123456789-8.zip)
- METS contains all metadata for Item and references to all Bundles and Bitstreams. Package also includes all Bitstream files.
- METS contains all Item/Bundle/Bitstream permissions/policies (translated into METSRights schema)
...
- Bitstreams and Bundles are second-class archival objects; they are recorded in the context of an Item.
- BitstreamFormats are not even second-class; they are described implicitly within Item technical metadata, and reconstructed from that during restoration
- EPeople are only defined in Site AIP, but may be referenced from Community or Collection AIPs
- Groups may be defined in Site AIP, Community AIP or Collection AIP. Where they are defined depends on whether the Group relates specifically to a single Community or Collection, or is just a general site-wide group.
What is NOT in AIPs
...
- DSpace Site configurations (\[dspace\]/config/ directory) or customizations (themes, stylesheets, etc) are not described in AIPs
- DSpace Database model (or customizations therein) is not described in AIPs
- Any objects which are not currently in the "In Archive" state are not described in AIPs. This means that in-progress, unfinished submissions are never included in AIPs.
Customizing What Is Stored in Your AIPs
...
There are two ways to go about customizing your AIP format:
- You can customize your
dspace.cfg
settings pertaining to AIP generation. These configurations will allow you to specify exactly which DSpace Crosswalks will be called when generating the AIP METS manifest. - You can export your AIPs using one of the special options/flags.
AIP Details: METS Structure
...
mets
element@PROFILE
fixed value="http://www.dspace.org/schema/aip/1.0/mets.xsd" (this is how we identify an AIP manifest)@OBJID
URN-format persistent identifier (i.e. Handle) if available, or else a unique identifier. (e.g. "hdl:123456789/1")@LABEL
title if available@TYPE
DSpace object type, one of "DSpace ITEM", "DSpace COLLECTION", "DSpace COMMUNITY" or "DSpace SITE".@ID
is a globally unique identifier, built using the Handle and the Object type (e.g.dspace-COLLECTION-hdl:123456789/3
).
mets/metsHdr
element@LASTMODDATE
last-modified date for a DSpace Item, or nothing for other objects.agent
element:@ROLE
= "CUSTODIAN",@TYPE
= "OTHER",@OTHERTYPE
= "DSpace Archive",unmigrated-wiki-markup- {{
name
}} = _Site handle_. (Note: The Site Handle is of the format {{\[handle_prefix
\]/0
}}, e.g. "123456789/0")
agent
element:@ROLE
= "CREATOR",@TYPE
= "OTHER",@OTHERTYPE
= "DSpace Software", {{name}} =Wiki Markup name
= "DSpace \ [version\]" (Where "\[version\]" is the specific version of DSpace software which created this AIP, e.g. "1.7.0")
mets/dmdSec
element(s)- By default, two
dmdSec
elements are included for all AIPs:- object's descriptive metadata crosswalked to MODS (specified by
mets/dmdSec/mdWrap@MDTYPE="MODS"
). See #MODS Schema section below for more information. - object's descriptive metadata in DSpace native DIM intermediate format, to serve as a complete and precise record for restoration or ingestion into another DSpace. Specified by
mets/dmdSec/mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DIM"
. See #DIM (DSpace Intermediate Metadata) Schema section below for more informationbelow for more information.
- object's descriptive metadata crosswalked to MODS (specified by
- For Collection AIPs, additional
dmdSec
elements may exist which describe the Item Template for that Collection. Since an Item template is not an actual Item (i.e. it only includes metadata), it is stored within the Collection AIP. The Item Template'sdmdSec
elements will be referenced by adiv @TYPE="DSpace ITEM Template"
in the METSstructMap
. - When the
mdWrap
@TYPE
value isOTHER
, the element MUST include a value for the@OTHERTYPE
attribute which names the crosswalk that produced (or interprets) that metadata, e.g.DIM
.
- By default, two
mets/amdSec
element(s)- One or more
amdSec
elements are include for all AIPs. The firstamdSec
element contains administrative metadata (technical, source, rights, and provenance) for the entire archival object. AdditionalamdSec
elements may exist to describe parts of the archival object (e.g. Bitstreams or Bundles in an Item).techMD
elements. By default, two types oftechMD
elements may be included:PREMIS
metadata about an object may be included here (currently only specified for Bitstreams (files)). Specified bymdWrap@MDTYPE="PREMIS"
. See #PREMIS Schema section below for more information.DSPACE-ROLES
metadata may appear here to describe the Groups or EPeople related to this object (_currently only specified for Site, Community and Collection). Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DSPACE-ROLES"
. See #DSPACE-ROLES Schema section below for more information.
rightsMD
elements. By default, there are four possible types ofrightsMD
elements which may be included:METSRights
metadata may appear here to describe the permissions on this object. Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="METSRIGHTS"
. See #METSRights Schema section below for more information.DSpaceDepositLicense
if the object is an Item and it has a deposit license, it is contained here. Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DSpaceDepositLicense"
.CreativeCommonsRDF
If the object is an Item with a Creative Commons license expressed in RDF, it is included here. Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="CreativeCommonsRDF"
.CreativeCommonsText
If the object is an Item with a Creative Commons license in plain text, it is included here. Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="CreativeCommonsText"
.
sourceMD
element. By default, there is only one type ofsourceMD
element which may appear:AIP-TECHMD
metadata may appear here. This stores basic technical/source metadata about in object in a DSpace native format. Specified bymdWrap@MDTYPE="OTHER",@OTHERMDTYPE="AIP-TECHMD"
. See #AIP Technical Metadata Schema (AIP-TECHMD) section below for more information.
digiprovMD
element.- Not used at this time.
- One or more
mets/fileSec
element- For ITEM objects:
- Each distinct Bundle in an Item goes into a
fileGrp
. ThefileGrp
has a@USE
attribute which corresponds to the Bundle name. - Bitstreams in bundles become
file
elements underfileGrp
. mets/fileSec/fileGrp/file
elements- Set
@SIZE
to length of the bitstream. There is a redundant value in the <techMD> but it is more accessible here. - Set
@MIMETYPE
,@CHECKSUM
,@CHECKSUMTYPE
to corresponding bitstream values. There is redundant info in the <techMD>. (For DSpace, the@CHECKSUMTYPE="MD5"
at all times) - SET
@SEQ
to bitstream's SequenceID if it has one. - SET
@ADMID
to the list of<amdSec>
element(s) which describe this bitstream.
- Set
- Each distinct Bundle in an Item goes into a
- For COLLECTION and COMMUNITY objects:
- Only if the object has a logo bitstream, there is a
fileSec
with onefileGrp
child of@USE="LOGO"
. - The
fileGrp
contains onefile
element, representing the logo Bitstream. It has the same@MIMETYPE
,@CHECKSUM
,@CHECKSUMTYPE
attributes as the Item content bitstreams, but does NOT include metadata section references (e.g.@ADMID
) or a@SEQ
attribute. - See the main
structMap
for thefptr
reference to this logo file.
- Only if the object has a logo bitstream, there is a
- For ITEM objects:
mets/structMap
- Primary structure map,@LABEL="DSpace Object", @TYPE="LOGICAL"
- For ITEM objects:
- Top-Level
div
with@TYPE="DSpace Object Contents"
.- For every Bitstream in Item it contains a
div
with@TYPE="DSpace BITSTREAM"
. Each Bitstreamdiv
has a singlefptr
element which references the bitstream location.
- For every Bitstream in Item it contains a
- If Item has primary bitstream, put it in
structMap/div/fptr
(i.e. directly under thediv
with@TYPE="DSpace Object Contents"
)
- Top-Level
- For COLLECTION objects:
- Top-Level
div
with@TYPE="DSpace Object Contents"
.- For every Item in the Collection, it contains a
div
with@TYPE="DSpace ITEM"
. Each Itemdiv
has up to two childmptr
elements:- One linking to the Handle of that Item. Its
@LOCTYPE="HANDLE"
, and@xlink:href
value is the raw Handle. - (Optional) one linking to the location of the local AIP for that Item (if known). Its
@LOCTYPE="URL"
, and@xlink:href
value is a relative link to the AIP file on the local filesystem.
- One linking to the Handle of that Item. Its
- For every Item in the Collection, it contains a
- If Collection has a Logo bitstream, there is an
fptr
reference to it in the very firstdiv
. - If the Collection includes an Item Template, there will be a
div
with@TYPE="DSpace ITEM Template"
within the very firstdiv
. Thisdiv @TYPE="DSpace ITEM Template"
must have a@DMDID
specified, which links to thedmdSec
element(s) that contain the metadata for the Item Template.
- Top-Level
- For COMMUNITY objects:
- Top-Level
div
with@TYPE="DSpace Object Contents"
.- For every Sub-Community in the Community it contains a
div
with@TYPE="DSpace COMMUNITY"
. Each Communitydiv
has up to twomptr
elements:- One linking to the Handle of that Community. Its
@LOCTYPE="HANDLE"
, and@xlink:href
value is the raw Handle. - (Optional) one linking to the location of the local AIP file for that Community (if known). Its
@LOCTYPE="URL"
, and@xlink:href
value is a relative link to the AIP file on the local filesystem.
- One linking to the Handle of that Community. Its
- For every Collection in the Community there is a
div
with@TYPE="DSpace COLLECTION"
. Each Collectiondiv
has up to twomptr
elements:- One linking to the Handle of that Collection. Its
@LOCTYPE="HANDLE"
, and@xlink:href
value is the raw Handle. - (Optional) one linking to the location of the local AIP file for that Collection (if known). Its
@LOCTYPE="URL"
, and@xlink:href
value is a relative link to the AIP file on the local filesystem.
- One linking to the Handle of that Collection. Its
- For every Sub-Community in the Community it contains a
- If Community has a Logo bitstream, there is an
fptr
reference to it in the very firstdiv
.
- Top-Level
- For SITE objects:
- Top-Level
div
with@TYPE="DSpace Object Contents"
.- For every Top-level Community in Site, it contains a
div
with@TYPE="DSpace COMMUNITY"
. Each Itemdiv
has up to two childmptr
elements:- One linking to the Handle of that Community. Its
@LOCTYPE="HANDLE"
, and@xlink:href
value is the raw Handle. - (Optional) one linking to the location of the local AIP for that Community (if known). Its
@LOCTYPE="URL"
, and@xlink:href
value is a relative link to the AIP file on the local filesystem.
- One linking to the Handle of that Community. Its
- For every Top-level Community in Site, it contains a
- Top-Level
- For ITEM objects:
mets/structMap
- Structure Map to indicate object's Parent,@LABEL="Parent", @TYPE="LOGICAL"
- Contains one
div
element which has the unique attribute valueTYPE="AIP Parent Link"
to identify it as the older of the parent pointer.- It contains a
mptr
element whosexlink:href
attribute value is the raw Handle of the parent object, e.g.1721.1/4321
.
- It contains a
- Contains one
...
For the Site Object, the following fields are translated to the DIM schema:DIM schema:
Metadata Field | Value | |
---|---|---|
Metadata Field | Value | |
dc.identifier.uri | Handle of Site (format: | ]]></ac:plain-text-body></ac:structured-macro> |
dc.title | Name of Site (from dspace.cfg 'dspace.name' config) |
MODS Schema
...
By default, all DSpace descriptive metadata (DIM) is also translated into the [MODS Schema|http://www.loc.gov/standards/mods/] by utilizing DSpace's {{MODSDisseminationCrosswalk
}}. DSpace's DIM to MODS crosswalk is defined within your {{\[dspace
\]/config/crosswalks/mods.properties
}} configuration file. This file allows you to customize the MODS that is included within your AIPs.
For more information on the MODS Schema, see http://www.loc.gov/standards/mods/mods-schemas.html
In the METS structure, MODS metadata always appears within a dmdSec
inside an <mdWrap MDTYPE="MODS">
element. For example:
...
Metadata Field | Value |
---|---|
dc.identifier.uri | Handle of Community |
dc.relation.isPartOf | Handle of Parent Community (as a URN) |
AIP Technical Metadata for Site
isPartOf | Handle of Parent Community (as a URN) |
AIP Technical Metadata for Site
Metadata Field | Value | |
---|---|---|
Metadata Field | Value | |
dc.identifier.uri | Site Handle (format: | ]]></ac:plain-text-body></ac:structured-macro> |
PREMIS Schema
At this point in time, the PREMIS Schema is only used to represent technical metadata about DSpace Bitstreams (i.e. Files). The PREMIS metadata is generated by DSpace's PREMISCrosswalk
. Only the PREMIS Object Entity Schema is used.
...
Info | ||
---|---|---|
| ||
You may have noticed several odd looking group names in the above example, where a Handle is embedded in the name (e.g. "COLLECTION_hdl:123456789/57_SUBMIT"). This is a translation of a Group name which included a Community or Collection Internal ID (e.g. "COLLECTION_45_SUBMIT"). Since you are exporting these Groups outside of DSpace, the Internal ID may no longer be valid or be understandable. Therefore, before export, these Group names are all translated to include an externally understandable identifier, in the form of a Handle. If you use this AIP to restore your groups later, they will be translated back to the normal DSpace format (i.e. the handle will be translated back to the new Internal ID).will be translated back to the new Internal ID). |
Warning | ||
---|---|---|
| ||
Warning | ||
| ||
Groups which are no longer utilized by the system may not be exported in the AIP. If a Group name includes a Community or Collection Internal ID (e.g. "COLLECTION_45_SUBMIT"), and that Community or Collection no longer exists, then the Group will not be exported in any AIPbe renamed to a more generic, random name of the format: "GROUP_[random-hex-key]_[object-type]_[group-type]" (e.g. "GROUP_123eb3a_COLLECTION_ADMIN"). The reasoning is that we were unable to translate an Internal ID into an External ID (i.e. Handle). If we are unable to do that translation, re-importing or restoring a group with an old internal ID could cause conflicts or instability in your DSpace system. In order to avoid such conflicts, these groups will never be included within AIPsare renamed using a random, unique key. |
Example of DSPACE-ROLES Schema for a Community or Collection
...