...
- Translating an EAD that describes an archival collection as a whole into well-formed Hydra/Fedora objects
- Creating the set object that represents the entire collection
- Creating intermediate set objects that represent the EAD hierarchy
- Creating "item" objects that represent nodes in the hierarchy that describe individual objects
- Translating item level information into well-formed (per Hydra) Fedora objects
- Required datastreams and child objects
- EAD to MODS mapping
- Address the born digital materials
- Recognizing what in the EAD describes "Born digital" content
- Enriching the digital objects with file-level information
EAD hierarchical structure mapping and content mapping
...
Site | Collection | EAD structure / location of born digital materials [type of hydra object] | notes | <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4eea3a29-c108-4b5c-a68c-3158fd91664b"><ac:plain-text-body><![CDATA[||
---|---|---|---|---|---|
Hull | Gallagher | collection [set] |
| ||
Hull | Socialist Health Assoc. |
|
| ||
Stanford | Xanadu | collection [set] | |||
*\[set\]* | |||||
Wiki Markup | |||||
Wiki Markup | |||||
Wiki Markup | |||||
Wiki Markup | 1. Target "born digital" sub-level identified by <unittitle> | ||||
Stanford | Gould | collection -- unittitle "Stephen Jay Gould papers" unitid: M1437 | EAD only goes down to the single "Born digital" series description, with no details expressed at lower levels. A rationalized directory structure and FTK output are intended to support a direct translation into Hypatia objects for both unprocessed and processed views without an intermediary EAD. | ||
Virginia | Cheuse |
|
| ||
Yale | Conn. Oral Histories |
|
| ||
Yale | Love Makes a Family |
|
| ||
Yale | Pelli |
|
| ||
Yale | Tobin | collection [set] | |||
*\[set\]* | |||||
Wiki Markup | |||||
*\[set\]* | |||||
Wiki Markup | |||||
Wiki Markup | 1. Target sub-level identified by <unittitle> | ||||
Yale | Turner |
|
| ||
Yale | Welch |
|
|
...
Stanford uses the Forensic Toolkit (FTK) software to analyze and characterize the contents of computer media. Starting with the Gould collection, we will only provide a single series node in the EAD to represent "Born Digital Materials". Conversion routines will be able to auto-generate objects representing the unprocessed collection (the media artifacts themselves, e.g., hard drives and floppy discs) as well as detailed file content objects from a modified form of the FTK output. See Stanford FTK to Hypatia object mapping
EAD-to-MODS - general information
...
- They can be used as a form of entity markup, strongly typing references within a longer block of text:
example
as rendered in browser
from
issue
action
<titleproper>Stephen J. Gould papers
<num>M1437</num>
</titleproper>Stephen J. Gould papers M1437
Stanford/Gould
entity markup disappears for display; would be visible and viable for editing?
Strip embedded markup
<langmaterial label="Language(s):">Chiefly in <language langcode="eng" scriptcode="Latn">English</language>; some materials in
<language langcode="fre" scriptcode="Latn">French</language>.</langmaterial>Chiefly in English; some materials in French.
Yale/Welch
ibid
Strip embedded markup
<unittitle>
<title render="italic">The Panda's Thumb</title>, galley proof, Chapters 22-31
</unittitle>, galley proof, Chapters 22-31
Stanford/Gould
<title> tag sets browser window title; is ignored as part of overall text
Strip out embedded <title> markup
- Complex elements in EADs can also be used for display markup:
tag
example
as rendered in browser
found in
issue
action
<p>
<scopecontent><p>Original series of 4 episodes ...</p>
<p>SG was series creator and writer ...</p>
<p>Feature-length pilot and series opener ...</p></scopecontent>Original series of 4 episodes ...
SG was series creator and writer ...
Feature-length pilot and series opener ...everywhere
Works great, but embedded markup is not desirable
Drop initial <p> and trailing </p>; otherwise retain <p> markup for short term convenience? It would have to be encoded (e.g., <) and reinterpreted on output.
<head>
<bioghist id="ref141">
<head>Biography</head>
<p>When five-year-old Stephen Jay Gould ....</p>Biography
When five-year-old Stephen Jay Gould ...everywhere
Heading displayed with text; treating them as labels is preferred
Turn <heading> into displayLabel attribute in corresponding MODS fields where possible.
<blockquote>
none so far
<emph>
<unittitle>Yale University
<emph render="smcaps">(restricted until January 1, 2024)</emph>
</unittitle>Yale University (restricted until January 1, 2024)
Yale (numerous)
non-html markup, ignored/lost
strip out?
<list>
<arrangement id="ref7">
:
<list type="ordered">
<item>
<ref target="ref11" ns2:type="simple" ns2:actuate="onRequest" ns2:show="replace">Inventory</ref>
</item>
<item>
<ref target="ref92" ns2:type="simple" ns2:actuate="onRequest" ns2:show="replace">Accession 2003-M-005</ref>
</item>
<item>
<ref target="ref123" ns2:type="simple" ns2:actuate="onRequest" ns2:show="replace">Accession 2004-M-088</ref>
</item>
</list>
</arrangement>Inventory Accession 2003-M-005 Accession 2004-M-088
Virginia:Cheuse
<frontmatter>
Yale:Tobin
<archdesc>non-html markup, ignored/lost
Convert data to comma separated list
<table>
<table frame="none">
<tgroup cols="3">
<colspec colnum="1" colname="1" align="left" colwidth="50pt"/>
<colspec colnum="2" colname="2" align="left" colwidth="50pt"/>
<thead>
<row>
<entry colname="1">Family Member</entry>
<entry colname="2">Spouse</entry>
</row>
</thead>
<tbody>
<row>
<entry colname="1">John Albee</entry>
<entry colname="2">Mary Delaney</entry>
</row>
</tbody>
</tgroup>
</table>Family Member Spouse John Albee Mary Delaney
none (example from EAD site)
non-html markup, ignored/lost
convert to html <table>?
(defer until encountered?)
See EAD specs for tabular display<address>
<repository label="Repository:">
<corpname>Manuscripts and Archives</corpname>
<address>
<addressline>Sterling Memorial Library</addressline>
<addressline>128 Wall Street</addressline>
<addressline>P.O. Box 208240</addressline>
<addressline>New Haven, CT 06520</addressline>
<addressline altrender="email">Email: mssa.faq@yale.edu</addressline>
<addressline altrender="phone">Phone: (203) 432-1735</addressline>
<addressline altrender="fax">Fax: (203) 432-7441</addressline>
</address>
</repository>Manuscripts and Archives Sterling Memorial Library 128 Wall Street P.O. Box 208240 New Haven, CT 06520 Email: mssa.faq@yale.edu Phone: (203) 432-1735 Fax: (203) 432-7441
Stanford
(frontmatter)
Yale
( Stanford
(frontmatter)
Yale
(archdesc)ignore <address> in initial conversion
<bibref>
<bibliography encodinganalog="3.5.4">
<bibref>HH Eckstein, The English health service (Harvard, 1959)
JE Pater, The making of the National Health Service (London, 1981)
John Stewart (1878-1967), Oxford Dictionary of Biography, Oxford, 2004</bibref>
</bibliography>HH Eckstein, The English health service (Harvard, 1959) JE Pater, The making of the National Health Service (London, 1981) John Stewart (1878-1967), Oxford Dictionary of Biography, Oxford, 2004
Hull:Socialist
<frontmatter>Implied line breaks are ignored/lost
Defer; not in converted data
<title>
<unittitle>
<title render="italic">The Panda's Thumb</title>, galley proof, Chapters 22-31
</unittitle>, galley proof, Chapters 22-31
Stanford:Gould
(numerous)
Virginia:Cheuse
(numerous)
Yale:(several)
(numerous)<title> tag sets browser window title; is ignored as part of overall text
Strip out embedded <title> markup
...
Issue: Tags that have no mapping into MODS
With one exception, we wil will map these into Notes, using displayLabel to lat let them appear with specific labels in the Hypatia display.
- <scopecontent> -- map to MODS <abstract> per DLF Guidelines.
- <bioghist> -- map to MODS <note>
- <custodhist> -- map to MODS <note>
- <relatedmaterial> -- map to MODS <note>
- <otherfindaid> -- map to MODS <note>
- <bibliography> -- map to MODS <note>
- <processinfo> -- map to MODS <note>
Conversion rule (Stanford): Use of <head> at the beginning of text fields as a labeling convention ...
...
Issue: Stanford <container> conventions and mapping into a MODS "Located in" noteLocation" note (revised 10/24/11 to split out Collection title in item record and nest this information in a relatedItem):
We will create a concise representation of the physical/logical location (as appropriate) of the materials in the context of the collection and its hierarchy. It will be a single string concatenating
- Collection name
- Intermediate series, subseries names etc if present
- The container label
- The container type + value
Is this generalizable, across Stanford collections? across institutions?
Examples:
MODS <relatedItem><physicalLocation type="location">. It will be a concatenation of the following information:
- Series and subseries names etc if present -- e.g., Series 6: Born Digital Materials
- The container type (box, map case, etc) and ID -- e.g., Box 11
- A sub-container type + value, down to the level of the item -- e.g., Folder 3
Assembles as "Series 6: Born Digital Materials - Box 11 - Folder 3"
Is this generalizable, across Stanford collections? across institutions?
Examples:
Collection | EAD | MODS |
---|---|---|
Gould | <c id="ref432" level="file"> | <mods:relatedItem type="host"> |
Hensen | <c id="ref50" level="item"> | <mods:relatedItem type="host"> |
Collection | EAD | MODS |
Gould | <c id="ref432" level="file"> | <mods:location> |
Hensen | <c id="ref50" level="item"> |
Issue: Derived <mods:location> information
Where all items objects are derived from FTK information about files in a directory, how is this logiallogical_physical locaiton location information assembled and presented?
Collection | FTK | MODS |
---|---|---|
Gould |
| <mods:relatedItem type="host"> |
Issue: Recursively nested <descgrp>
...
EAD element | MODS element | Notes | Example |
---|---|---|---|
<unittitle> | <titleInfo> | • Requires embedded element conversion |
|
<origination> | <name type="..."> | • EAD/persname maps to MODS <name type="personal"> | <origination label="creator"> |
<repository> | <name> | Map <corpname> only to a corporate name with role=repository. <repository><corpname> to | <repository> |
No corresponding EAD element | <typeOfResource> | For any Hypatia set created, create an entry indicating a collection. | <mods:typeOfResource collection="yes"/> |
<controlaccess> | <genre> | • EAD origination source attribute maps to MODS/genre authority attribute | <controlaccess> |
<unitdate> | <originInfo> | If only one <unitdate> is present for a <did>, add attribute keydate="yes". If more than one <unitdate>, only add keydate="yes" if EAD type="inclusive". | <mods:originInfo> |
<langmaterial> | <language> | For <langmaterial> | <langmaterial label="Language(s):">The materials are in <language langcode="eng" scriptcode="Latn">English</language>.</langmaterial> |
No corresponding EAD element | <physicalDescription> | Add a "born digital" indication only for the born digital items in the collection, else omit. | <mods:physicalDescription> |
<physdesc> | <physicalDescription> | • Each EAD <extent> subelement will become a MODS/extent element | <physdesc> |
<abstract> or <scopecontent> | <abstract> | Map EAD label attribute to MODS displayLabel attribute | <abstract label="Summary:">The papers consist of correspondence, subject files, and writings, primarily documenting the professional career and personal life of James Tobin as an economist and educator.</abstract> |
<descgrp> <descgrp><scopecontent> | <note> | • Requires embedded element conversion
| <prefercite id="ref6"> |
<arrangement> | <tableOfContents> | Mapping per DLF guidelines, with default displayLabel of "Arrangement". | <arrangement id="ref206"> |
No corresponding EAD element | <targetAudience> | mapping not applied to sample EADs |
|
<odd> | <note> | not found in sample EADs |
|
<controlaccess> with | <subject> with | Mappings of EAD <controlaccess> subelements to MODS's <subject> subelements: | <controlaccess> |
No corresponding EAD element | <classification> | No mapping in samples |
|
No corresponding EAD element | <relatedItem> | No mapping in samples |
|
<unitid> | <identifier> | • All mapped to identifier of type=unitid | <unitid>M1437</unitid> |
No corresponding EAD element | <location><url> | No candidate sample data, through conversions could provide useful additions for born digital materials |
|
<accessrestrict> | <accessConditions> | • Requires embedded element conversion | <accessrestrict id="ref5713"> |
<userestrict> | <accessCondition> | • Requires embedded element conversion | <userestrict id="ref5"> |