...
Section | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
Each <file> segment will be a the basis of a separate DOR /or Hypatia Digital Object. The differences are:
- Stanford DOR (Digital Object Registry) objects are metadata-only, with content externally managed. Objects at Stanford will not have "content" datastreams.
- Stanford objects have an identityMetadata datastream that may or may not be present in Hypatia demo objects. Regardless, it is not a standard part of Hydra-compliant objects.
Collection and Series objects
The Collection and Born-Digital Series objects themselves are created first, ahead of FTK processing. All FTK processed materials for a collection are processed together and are members of the Born-Digial Series set. Media objects must be linked to the appropriate series via an isMemberOf relationship.
Media (e.g. Disk Image) objects
The FTK processing must first create a set of media objects representing the physical media (hard drive, diskette, etc) on which the files were found. This has been described as a view of the "unprocessed" collection, meaning it has not been processed down to the individual units of content, the separate files.
...
- an "item" -- it represents a unit of meaning and has content "parts" as separate objects
- a "set" -- it has object related to it as members ... should we consider a specialized relationship for this?
Sample of the starting lines of the .txt file describing the media object.
...
From: Disk Image // CMnnn.001.txt | maps to | notes |
---|---|---|
Evidence Number: CM004 | descMetadata | Would correspond to EAD <c><unittitle> |
Evidence Number: CM004 | descMetadata | Would correspond to EAD <c><unitid> |
Case Number: M1437 | DC |
|
Notes: 5.25 inch Floppy Disks | descMetadata descmetadata | Would correspond to EAD <physdesc> in a node describing the media. |
(implied) | RELS-EXT | A link to the Collection object |
(implied) | RELS-EXT | A link to the Series object |
identityMetadata – label = colleciton context...
File objects
File objects are the node objects representing individual files. The atomistic model has would have these objects constructed as a parent (metadata) object and a child (content) object. Do we want to consider an integrated object combining For simplicity, we will create these File objects as a single object, combining the Hydra commonMetadata and genericContent models instead?.
Sample of transformed FTK file available as input:(though the converison modules will not use this as input, but rather work from the same mapping rules that express this output).
Panel | |
---|---|
|
Information from: FTK xml // Report_transformed.xml | maps to (within item objects) | notes | |||
---|---|---|---|---|---|
<filename>BU3A5</filename> | descMetadata | this is the original file name as it appeared on the original media. | |||
<Item_Number>1004</Item_Number> | descMetadata | internal FTK reference only, to disambiguate references in the FTK report | |||
<filepath>CM006.001/NONAME [FAT12]/[root]/BU3A5</filepath> | descMetadata | <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a225c429-0243-49f6-98b0-f25a54988820"><ac:plain-text-body><![CDATA[ | <filepath>CM006.001/NONAME [FAT12]/[root]/BU3A5</filepath> mods:physicalLocation> | location of file on original media | ]]></ac:plain-text-body></ac:structured-macro> |
<disk_image_no>CM006</disk_image_no> | Naomi thinks this should be handled with a link to the media (disk image) object | This token, taken from the head of the <filepath>, is the only data link between the FTK output for a file object and the corresponding media object. We want a data link in descriptive metadata as well as an RDF link to the corresponding object. | |||
<filesize>35654</filesize> | descMetadata - human friendly | This should be a human friendly version of the file size. The machine friendly version is in contentMetadata. | |||
<filesize_unit>B</filesize_unit> | use to determine filesize in bytes (convert to bytes if nec) | Needed to correctly interpret <filesize>, if used | |||
<file_creation_date>n/a</file_creation_date> | descMetadata |
| |||
<file_accessed_date>n/a</file_accessed_date> | descMetadata |
| |||
<file_modified_date>12/8/1988 6:48:48 AM (1988-12-08 14:48:48 UTC)</file_modified_date> | descMetadata |
| |||
<MD5_Hash>976EDB782AE48FE0A84761BB608B1880</MD5_Hash> | contentMetadata | Used for checksum validation of a file during processing. This value will eventually be part of contentMetadata, but probably not as a value transferred from here. | |||
<restricted>False</Restricted> |
| true=visible staff only, not discoverable .... Hypatia only | |||
<type>Books</type> | Naomi sez: mods doc says this is controlled vocab, so this won't work ... [http://www.loc.gov/standards/mods/mods-outline.html#typeOfResource | <topic? or <genre>? authority? | typeOfResource>Books</mods:typeOfResource> | ||
<title>The Burgess Shale and the Nature of History</title> | descMetadata | This is not the title of the file or the file content directly, but the author's title to which the file relates. | |||
<filetype>WordPerfect 4.2</filetype> | descMetadata |
| |||
<Duplicate_File> </Duplicate_File> |
| * blank, null value or empty string - file is unique in collection, no duplicates | |||
<export_path>files\BU3A5.wp</export_path> |
| The file as saved by FTK for further processing. | |||
(implied) | RELS-EXT | A link to the Media object |
(1) Location/container information -- for every file object created, create a <mods:location> description that places the resource in the context of the collection by combining collection name, intermediate series/group/etc name(s), and the ID+description of the media on which the file resides, :
<location>collection-title - series-title - media-title (media-description)</location
e.g.,
<location>Stephen J. Gould Papers - Series 6: Born Digital Materials - CM006 (5.25 inch Floppy Disks)
Use this concept for objectLabel?