Proposed list of metadata fields to drive the discovery and delivery of document level objects
The following is a list of descriptive, technical and administrative metadata that may accompany a document in Hypatia. Documents can be either a single file or a grouping of files that exist together in either a folder, a zipped archive or a disk image. Not all documents will necessarily have metadata for all of these metadata fields.
Field Name |
ISAD(G) Element |
Description |
Searchable? |
Facet? |
Display? |
Sortable? |
Allow edit? |
Metadata Source |
---|---|---|---|---|---|---|---|---|
|
|
Fields required for Standards-Compliant Archival Description |
|
|
|
|
|
|
Repository |
Name of archival unit responsible for the collection. |
Yes |
Yes |
Yes |
Yes |
|
collection object |
|
collection call number |
|
Maybe |
Yes |
Yes |
Yes |
|
collection object |
|
collection title |
|
Yes |
No |
Yes |
Yes |
|
collection object |
|
accession number |
|
|
No |
No |
Maybe? |
No |
Yes |
? |
Document identifier |
|
Yes |
No |
Yes |
Maybe |
No |
autogenerated |
|
archival context |
|
Location of the document in an intellectual arrangement (series, subseries, etc.) |
No |
Yes (with collection title) |
Yes |
No |
Yes (archivist) |
FTK / Parent object(s) / EAD |
Level of description |
Identifies the level of arrangement of the unit of description |
No |
Maybe |
Yes |
No |
Yes (archivist) |
|
|
Conditions governing access (facet) |
To provide information on the legal status or other regulations that restrict or affect access to the unit of description. Peter: I used the following controlled vocabulary for AR - Access restrictions: AR:Owner; AR:Archivist; AR:Invited person; AR:Public; AR:Reading room |
Yes (need controlled vocabulary) |
Yes |
Yes |
Maybe |
Yes (archivist) |
FTK? |
|
Conditions governing access (note) |
To provide information on the legal status or other regulations that restrict or affect access to the unit of description. Peter: I used the following controlled vocabulary for AR - Access restrictions: AR:Owner; AR:Archivist; AR:Invited person; AR:Public; AR:Reading room |
Maybe |
No |
Yes |
No |
Yes (archivist) |
EAD? |
|
Conditions governing use/reproduction |
|
Yes (need controlled vocabulary) |
Yes |
Yes |
Maybe |
Yes (archivist) |
FTK / EAD |
|
Conditions governing use/reproduction (note) |
|
Maybe |
No |
Yes |
No |
Yes (archivist) |
EAD |
|
Scope and contents |
|
Yes |
No |
Yes |
No |
Yes |
|
|
Creator |
|
Yes |
Yes |
Yes |
No |
Yes (archivist) |
FTK / parent object |
|
subject heading, name, etc. (manually assigned) |
|
|
Yes |
Yes |
Yes |
No |
Yes (archivist) |
FTK / |
subjects, name, place (software generated) |
|
|
Yes |
Yes |
Yes |
No |
Yes (archivist) |
Entity extraction software/service (e.g. OpenCalais) |
Citation |
|
|
No |
No |
Yes |
No |
Yes (archivist) |
|
document title |
Title supplied by archivist describing the document |
Yes |
No |
Yes |
Yes |
Yes (archivist) |
EAD? |
|
document date |
Is this the creation date or last modified date. Do we need both? |
Yes |
Yes (need both) |
Yes |
Yes |
|
FTK / Ingest |
|
document size |
Indicates the file or document's size on a filesystem |
No |
No |
Yes |
No |
|
FTK / Ingest |
|
|
|
Additional fields required for assets |
|
|
|
|
|
|
source media |
Description of the physical carrier for a record (floppy disk, hard disk, etc.) Peter: I used the following controlled vocabulary for CM - Computer media: CM:5.25 floppy; CM:3.5 floppy; CM:Punch card; CM: CD/DVD; CM: Hard Drive; CM: Zip Disk: CM:Tape; CM: Cloud Storage; |
No |
Yes (need controlled vocabulary) |
Yes |
No |
|
FTK / |
|
operating system and version (if known) |
Peter: I think this field is not necessary. Also, I don't know any tools I can get this info. Files can be created by different os and stored in 1 computer. |
|
|
|
|
|
|
|
document type |
|
Controlled value list. Is this a text document, image, audio, video, forensic image etc. Where is this list coming from? Peter: I used the following controlled vocabulary for FT - Format Type: FT:Document; FT:Spreadsheet; FT:Computer Program; FT:Image; FT: Video; FT: Audio; FT: Email |
No |
Yes (need controlled vocabulary) |
Yes |
No |
Yes (archivist) |
FTK / |
file or document name |
|
Document or file name assigned to an object by an operating system |
Yes |
No |
Yes |
No |
Maybe |
FTK / Ingest |
document location |
|
Location of the document on a filesystem. This is different from the archival location of a document in a series / subseries. |
No |
No |
Yes |
No |
|
FTK / |
mime type (original) |
The mime type indicates the type of document and may indicate the application that was used to create the document |
Maybe |
No |
Maybe |
No |
|
Ingest |
|
mime type (presentation version) |
|
|
No |
Maybe |
Maybe |
No |
|
Ingest |
application software and version (if known) |
|
No |
No |
Yes |
No |
Yes (archivist) |
FTK / |
|
thumbnail image |
|
image that represents the document type (eg. PDF, text, image etc.) Peter: If the file is an image, it should be the relative thumbnail. |
No |
No |
Yes |
No |
|
FTK for image thumbnail / |
"Download" this |
|
button that allows the archivist or end user to download the document (if permitted)Peter: We may also consider adding digital signature of the institution to the files. |
No |
No |
Yes |
No |
No |
|
checksum |
|
|
No |
No |
Yes |
No |
No |
FTK / Ingest |
Take-down request / policy |
|
|
No |
No |
Yes |
No |
Yes (public for request) |
Web UI |
Original file |
|
|
No |
No |
Yes |
No |
No |
|
Display version of the original file |
|
|
No |
No |
Yes |
No |
No |
|
Presentation format history |
|
Automated? piece to say that original file X was converted by Person Y using software Z on this date |
|
|
|
|
|
|
|
|
User-generated content |
|
|
|
|
|
|
annotations ("stories") |
|
|
No |
No |
Yes |
No |
Yes (creator / invited public / public) |
Web UI |
archivist created tag |
|
tags that archivists/curators add - become facets (How are these different from access points) |
|
yes |
|
|
|
Web UI |
creator tag |
|
tags by creator - become facets by creator (How are these different from access points) |
|
yes |
|
|
|
Web UI |
(pre-)approved user tag |
|
tags that are added by approved users outside of the repository/library - should show up in facet as similar to an approved editor in Wikipedia (?) |
|
yes |
|
|
|
Web UI |
user created tag |
|
tags created by non-approved users; might go through vetting process by repository/library or be listed as unverfied/unvetted editor (like Wikipedia?) |
|
? |
|
|
|
Web UI |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 Comments
Simon Wilson
Document identifier
Just to raise the issue about docuemnt identifier I mentioned at Charlottesville, with paper each unit of production has a unique reference. With born digital material it is relly unlikely that we will catalogue every item, at Hull we are probably looking at series level description but this could contain 5 or 55 digital assets in it. We would be relying on URI or DOI to provide a unique reference for researchers to cite in the same way as electronic jurnals do
Simon
Mark A. Matienzo
Re: Document identifier - I suspect this is going to be problematic given the following phrase at the top of this page:
This isn't clear enough to me if it'll allow us to create "traditionally archival" aggregations like series, etc., or this was something entirely. By virtue of Hypatia running on top of Fedora, all of these objects will have a distinct identifier, but I guess my question is if they should relate some how to the identifier for the collection/series/etc. Also, the "document identifier" (if we're really talking about aggregations like series might be something like "MS 394 Series 10" or something along those lines.
Simon Wilson
Original Format and Presentation Format
We need to be able to clearly distinguish between the created last modified date of the original document which may not be the same as the format you are presented with
- this impacts not only date but also document size, so do we need original document size (1.2MB) and presentation format document size (0.4MB) - one is important to understand and the latter is actually what you want to download
Mark A. Matienzo
We could handle this differently, pending consultation with the analysts on the project - we could treat the original file and the presentation version as distinct objects.
Simon Wilson
Mark I agree we could handle them differently I just wanted to raise the issue of making the distinction clear to our users?
- some will care that file X has been migrated and will want to know it is reliable and trustworthy etc etc
- some will not care or understand and just want the information regardless of whether it is in format A or format B
Although the use of surrogates is common in archives there isn't an established language/phraseology for these concepts in an online catalogue context - although beyond the scope of the project it is an interesting area that needs more work
Mark A. Matienzo
For the initial tracer bullet my understanding is that we will be only dealing with the original files and not presentation versions. Would you be willing to put a hold on this now and come back to it later?
Mark A. Matienzo
Re: software-generated access points (e.g. using OpenCalais): do we want to group these with the standards-compliant archival description section? My inclination is not to do this since they're likely not to fit with common vocabularies used in libraries and archives, like LCSH or LCNAF.
Peter Chan
I agree.
Mark A. Matienzo
Peter, are you saying that you agree that they should not be grouped with the standards-compliant archival description section?
Mark A. Matienzo
Re: document type - is this useful? Can we identify this in a systematic way without reviewing individual files? How is calling something a "document" useful, if the document could be a letter written in MS Word, a screenplay, etc.? Is this more or less useful than identifying file formats?
Mark A. Matienzo
Comments from Apr 26 skype call: