Inventory of Hypatia Collections
Preparation of Collections for Hypatia
Collection Name / Institution |
All Files on SUL-BRICK |
Analysis Complete |
RELS-EXT Created |
Fedora Objects Created |
---|---|---|---|---|
Xanadu / Stanford |
|
|
|
|
Gould / Stanford |
|
|
|
|
Koch / Stanford |
|
|
|
|
Creeley /Stanford |
|
|
|
|
Gallagher / Hull |
|
|
|
|
Socialist Health / Hull |
|
|
|
|
Tobin / Yale |
|
|
|
|
Turner / Yale |
|
|
|
|
Pelli Clarke / Yale |
|
|
|
|
Cheuse / UVa |
|
|
|
Hypatia EAD conversion analysis
Stanford
Collection Name |
Estimated Size of Collection in Hypatia |
---|---|
M1437 Gould |
2.5 GB |
M1292 Xanadu |
5.0 GB |
M0662 Creeley |
3.0 GB |
M1584 Koch |
35 GB |
Stephen Jay Gould
The collection was re-processed due to a change in storage location and new ideas on relationships between files and EAD.
Processed files are currently stored in
\\sul-wallaby\ForensicsLab\01-OBJECT_POOL\M1437 Stephen Jay Gould\M1437 Gould
and in Sul-Brick/sulguest/Stanford/M1437 Gould
Directory Structure is as follows:
- Computer Media Photo
- EAD
- FTK html
- FTL xml
- Disk Image
- Transit Solution
"FTK html" folder is used to store report from AccessDataFTK in html.
"FTK xml" folder is used to store report from AccessDataFTK in xml.
"Logical Image" folder is used to store the logical images and the audit logs of disk imaging.
"Transit Solution" folder is used to store the html version of the original files created by Transit Solution.
Xanadu
A Collection consists of 6 hard drives. A Marc record for the collection is available in SearchWorks; a very basic finding aid describes the contents of the collection.
Contents of the collection are currently stored on \\sul-wallaby\ForensicsLab\01-OBJECT_POOL\M1292 Xanadu
Xanadu EAD and Hypatia fixture objects
Directory Structure is as follows:
- Disk Images
- Computer Media Photo
- EAD
The Disk Images folder contains 3 forensic disk images from 3 physical hard drives. The forensic disk images are named CMxx.dd with the "CM" standing for computer media. This folder also contains two additional metadata files for each forensic disk image. The first is a .txt file that contains technical metadata about the forensic imaging process (example CM01.001\). The second is a .csv file that lists the partitions and files contained on the hard drive (example CM01.001\). This file also contains the root path, creation dates, and whether the file was deleted on the media and subsequentially recovered.
The Photo Images of Drives folder contains digital photographs of the source media (JPEG), in this case images of the front and back of the harddrives.
The EAD folder contains the Encoded Archival Description file for the Xanadu collection (example EAD\). This file currently does not contain any pointers to where the hard drives are physically located in the collection. We are also currently missing reference identifiers to the computer media in the finding aid. I believe this is just an oversight but I'm following up with Special Collections to determine why they are missing.
Yale
Summary
Collection title |
Number of files/objects |
Total Extent in (mega/giga)bytes |
Extent to be transferred for development |
EAD filename |
Level of description of born-digital material |
---|---|---|---|---|---|
James Tobin papers |
27 disk images + metadata (approx 80 files total) |
36 MB |
36 MB |
mssa.ms.1746.bpg.xml |
Disks are described individually within EAD as separate components |
Henry Ashby Turner papers |
~5-10 |
~200 MB |
~80 MB |
mssa.ms.1691.bpg.xml |
Components represent individual digital objects within a specific subseries |
Love Makes a Family records |
TBC |
~36 GB |
TBC |
mssa.ms.1962.bpg.xml |
Only described at high-level aggregations |
Pelli Clarke Pelli records |
TBC |
~6 GB |
TBC |
mssa.ms.1939.bpg.xml |
Currently completely undescribed |
New Haven Oral Histories |
TBC |
~101 GB |
TBC |
mssa.ru.1055.bpg.xml |
Described as individual "interviews" - audio file + MS Word document |
James Welch papers (Beinecke) |
TBC |
TBC |
TBC |
beinecke.welch.bpg.xml |
TBC |
James Tobin papers
- Assets loaded on sul-brick; in directory /home/sulguest3/Yale/mssa.ms.1746. This directory is a BagIt bag.
- All of the assets are related to sub-components within the Computer diskettes (3.5 inch) subcomponent of Accession 2004-M-088.
- Within this directory, each directory has the format 2004-M-088.nnnn (e.g. 2004-M-088.0001)
- Directory names correlate with unitids in the EAD for components that represent individual disk.
- Each directory has three files: a disk image (.dd extension); an imaging log file (.txt); and filesystem level metadata extracted from the disk image (.xml; comparable to the CSV files created by FTK Imager)
Henry Ashby Turner papers
- Assets loaded on sul-brick; in directory /home/sulguest3/Yale/mssa.ms.1691 - there are only 2 files.
- Each file asset is associated with a specific component; in other words, only two components have assets associated with them. The assets are a Microsoft Access database and a FileMaker Pro database.
- The components that have an asset associated with them contain a dao element. This element's xlink:href attribute is a file URI that points to the location on sul-brick (this is a hack, but it should be sufficient)
Virginia
I have finally been able to image the floppy disks and use FTK to do some basic processing. The EAD remains unchanged.
There are 7 disk images. The physical disks themselves were numbered with a different schema than the <c0n> elements in the EAD. The physical disk numbers were used to create the filenames for the disk images.
<c02 level="item" id="d1e560"> corresponds to disk images: 10726-p-q002001, 10726-p-q002002, 10726-p-q002003, and 10726-p-q002004b
<c02 level="item" id="d1e571"> corresponds to disk image: 10726-p-q002005
<c02 level="item" id="d1e582"> corresponds to disk image: 10726-p-q002006
<c02 level="item" id="d1e594"> corresponds to disk image: 10726-p-q002007
I have bookmarked the files within the images using FTK. The bookmarks correspond to the <c02> "id" attributes (so, files belonging to <c02 id="d1e582"> are bookmarked "d13582"). The files should be arranged within those containers in the collection. There are other files on the disks bookmarked "ignore" which do not need to be added to Hypatia. So, we would like to have the individual documents added in addition to the disk images. Is this possible?
The FTK html report and XML report are included in the "report1" folder. These include the technical metadata drawn from the disk images. The disk images are in the folder "diskImages" and photographs of the physical disks are in the folder "photos." These jpeg images are named to correspond with the disk image file name and should also be available along with the disk images.
The file sizes are as follows:
- 5 MB report1/files folder
- 36 MB diskImages folder
- 26 MB photos fp;der
Please let me know any specific questions.
-Gretchen
What I have to submit is some EAD for the Cheuse collection, and 4 zip files which match the id number of <co2> elements in the EAD. The zip files contain images of each disk and pdf files. I can't actually image the disks...I don't have the hardware yet. For the purposes of the tests, what I did was:
- Took pictures of the floppies
- Created a directory structure that matched the structure in the EAD and put the images of each disk in the appropriate folder
- Added a dummy pdf to each folder
- Zipped up each folder and ran it through Rubymatica which:
- unzips
- Creates some technical metadata within a METS.xml file
- Rezips
So the .zip archives included .txt, .xml, .jpg, and .pdf files
Hull
Files transferred via external hard drive/USB pen drive so no physical media to photograph
Collection title |
Number of files/objects |
Total Extent (mega/giga) |
Extent to be transferred for development |
EAD filename |
Level of description of born-digital material |
---|---|---|---|---|---|
Stephen Gallagher |
paper records (7.5m) |
n/a |
~200 MB |
U DGA.xml |
Currently working through the material, with detailed series descriptions |
Socialist Health |
paper records (6.5m) |
n/a |
TBC |
U DSM.xml |
Preliminary cursory look only - scheduled to start this shortly |