Overview

A message was sent to the PCC mail list requesting participation in a survey to prioritize user stories for cataloger's working with external authoritative data.  This page summarizes the results of that survey.

Survey Structure

Survey Opened: Nov 30, 2020

Survey Closed: Dec 10, 2020

Question Structure:

  • The primary question listed cataloger user stories and asked respondents to sort the stories based on priorities into bins labeled Extremely important, Very important, Moderately important, Slightly important, and Not at all important.  Each bin could hold at most 6 user stories.
  • There were 2 open ended questions asking respondents "Do you have any use cases that we missed?" and "Any other feedback you would like to provide?"
  • No personally identifiable information was requested as part of the survey.

Summary of Survey Results

Prioritized cataloger user stories

The table lists the user stories sorted on the priorities identified by respondents in the survey.  The highest rated response is at the top and the lowest rated response is at the bottom.  Responses are separated into 4 priority levels 1-4 with 1 being the highest.  This divides the priorities into groups of 7 each.  These priority levels will be used in other documents that reference the survey.


Priority Level 1 (highest priority)
IDShort Descriptor 1User Story (as it appears in the survey)
c-18contextAs a cataloger, I want to see contextual information (e.g. variant labels, occupation, birth date, etc.) about the search results that distinguishes it from other, similar-looking results, to help me to select the correct authoritative entity and to recognize false positives.  The context may be drawn from authoritative entities and real world object entities based on what is available in the authoritative data. (c-18)
c-20filter class typeAs a cataloger, I want to be able to filter search results to a specific class type (e.g. a corporate name, person name, meeting name, etc.; manifestation, item, expression, etc.). (c-20)
c-28doesn't existAs a cataloger, I want to determine whether the entity I'm searching for doesn't exist in the authoritative source that I'm searching. (c-28)
c-01edit + link to URIAs a cataloger, I want to edit an entity (e.g. work, instance, etc.) and add a link to an URI from an external authoritative sources (e.g. LCNAF, OCLC FAST, etc.). (c-1)
c-09see broader/narrowerAs a cataloger, I want to be able to see broader and narrower terms when the authority is hierarchical. (c-9)
c-23performanceAs a cataloger, I want search results to be returned quickly, so that I can catalog efficiently, generally seen as sub-second results or some indicator that a longer search is being processed. (c-23)
c-03exact match + URIAs a cataloger, I want to be able to enter the exact external authoritative label and get the URI from the external authority linked to the entity being edited.  This applies when there is a unique authoritative term. (c-3)
Priority Level 2
IDShort Descriptor 1User Story (as it appears in the survey)
c-02edit + labelAs a cataloger, I want to edit an entity (e.g. work, instance, etc.) and display a label from an external authoritative sources (e.g. LCNAF, OCLC FAST, etc.). (c-2)
c-04left anchor type aheadAs a cataloger, I want to start typing a known external authoritative label and get the URI from the external authority linked to the entity being edited.  This is left anchored type-ahead. (c-4)
c-10step into broader/narrowerAs a cataloger, I want to be able to step into broader or narrower terms when the authority is hierarchical. (c-10)
c-06variant matchAs a cataloger, I want additional information in the search that indicates that the term listed has a variant that matches the keyword typed. (c-6)
c-05variant type aheadAs a cataloger, I want to start typing a known variant external authoritative label and get the URI of the authoritative label linked to the entity being edited. (c-5)
c-14transparency of indexingAs a cataloger, I want transparency in the approach for indexing to be clear.  (e.g. exact match on primary label, stemming, which fields are searched, etc.)  May vary between authorities. (c-14)
c-25timeout gracefullyAs a cataloger, if a request fails, it should fail in a reasonable amount of time and reply gracefully providing a reason for the failure (e.g. Time Out, No Result Found, No Exact Match, etc.) (c-25)
Priority Level 3
IDShort Descriptor 1User Story (as it appears in the survey)
c-24paginationAs a cataloger, I want to be able to request additional search results if what I am looking for isn't visible in the current set of results displayed, e.g. I didn't get it in the first 10, so give me 10 more results (aka pagination). (c-24)
c-26auto-change_managementAs a cataloger, I want information displayed in the editor UI to match the information in the authoritative source.  This primarily impacts editing and displaying of entities where the data in the authoritative data has changed (e.g. split, merged, renamed, deleted) to be sure that cataloged data remains accurate over time. (c-26)
c-21filter specific fieldsAs a cataloger, I want to be able to filter on specific fields in the search results (e.g. occupation, resource format, agent, etc.)  This is a filter of results after they are returned similar to a facet. (c-21)
c-11link to external searchAs a cataloger, when I am unable to find what I'm looking for in an authority lookup, I want to be able to search an authority source in an external site by clicking on a link to its native search UI. (c-11)
c-16choose keyword vs left anchorAs a cataloger, I want to choose how search results are returned (e.g. left anchored, keyword indexing rank, or as yet unknown approach) (c-16)
c-19filter date rangeAs a cataloger, I want to be able to filter search results to a specific date range for a field on the authoritative entity (e.g. birth date, death date, etc.). (c-19)
c-08search broaderAs a cataloger, I want to be able to search for a broader term in a hierarchy and get a list of narrower terms from which to select.  NOTE: Some systems have seen performance issues in actual implementations.  Catalogers generally know what they are looking for. (c-8)
Priority Level 4 (lowest priority)
IDShort Descriptor 1User Story (as it appears in the survey)
c-12keyword search other fieldsAs a cataloger, I know some keywords, other attributes related to the entity that are not in the primary or variant label (e.g. occupation, resource type, etc.), that will help me locate and select an authoritative entity. (c-12)
c-07alternate id searchAs a cataloger, I want to type in the alternate identifier (e.g. Q label in wikidata, ISNI label, organization code etc.) and get the URI for those entities. (c-7)
c-13accuracyAs a cataloger, I want search results to contain highly relevant terms for my keyword search based on standard indexing approaches.  Actual relevancy is subjective. (c-13)
c-22advanced searchAs a cataloger, I want to be able to specify in the search limiting results to a keyword in a particular field (e.g. an advanced search that passes in 'occupation includes humorist'). (c-22)
c-15rank orderAs a cataloger, I want search results listed in rank order as determined by standard indexing approaches. (c-15)
c-17which fields triggered inclusionAs a cataloger, I want to see, for each entity that appears in results of my keyword search, which of the fields that were searched triggered its inclusion in those results.  (e.g. keyword was in the variant label, occupation, or descriptions instead of the primary label) (c-17)
c-27manual-change_managementAs a cataloger, I do not want authoritative data to be automatically updated when the data is changed in the authoritative source.  I want to control if and when that information is updated. (c-27)

1 The short descriptors for user stories were not listed on the survey.  They are included here to facilitate conversations about the results.  By themselves, they do not container enough information to state the user stories clearly.



Figure 1:  Relative scores of the responses



Responses to "Do you have any use cases that we missed?"

8 responses

Select authorities
As a cataloger, I want to be able to arrange (and rearrange with ease) which authority source results display first (e.g., RBMS, AAT, LCNAF; or LCNAF, AAT, RBMS).
For specific classes allow a default search to specific vocabularies that doesn't need to be reset every time you access the tool.
I did not see anything about selecting specific authority vocabularies for use, which is the most primary decision in using any authority data in local catalogs.
Ordering, relevancy, and pagination of results
I would like left-anchored type-ahead to find matching results for text strings (e.g. an authorized access point in NAF/1xx in MARC)--I am not sure if that is the same as (c4)
To modify c(16) and (c17), I would like to sort results by my choice of field, with the x-indexed field displayed in the list of search results, preferably in alphanumeric order (generally, left-anchored)
The biggest and most frequent failure points I've had with QA have been with very common place names (e.g. London (England), New York (N.Y.) ) and with prolific authors that have a lot of uniform title records in the NAF. This relates to both c-4 (relevancy ranking) and c-24 (the ability to page to additional results).
Transparency
As a cataloger, I want to know how search results are being determined (exact match? fuzzy match? something else?) and what arrangement and filtering options are available. For browse searches, I want to know what metadata types are included in a browse index.
Related to languages
Choice of language codes or labels
This is already covered by some of what you mentioned, but I wanted to emphasize that as a cataloger who works with non-Latin scripts, I want to be thrown into a browseable (left-anchored) index so I can see variants that may not already exist in the authority record for the entity that I am looking for -- i.e., if I search "Aloupe, Helene", I want to be able to see that "Aloupi, Eleni" exists already an as established entity with an authority record, because the difference there is a matter of transliteration--the entity is the same, but the transcription from the Greek differs. I want to easily identify one as a variant of another without having to remember all the possible variants of "Helene" and keyword searching all of those possible variants to ensure that I've found them all & and am using the correct term/am not creating a new, duplicate authority when it's merely the transliteration schema that differs.



Responses to "Any other feedback you would like to provide?"

19 responses (a few split for categorization)

Context
As a cataloger, I am typically looking at a resource with some information about an entity and an authority source with some information and hoping to discern points of agreement which add up to sufficient congruence between the two to persuade me they are the same. Contextual information in both the resource and the authority file description are crucial to this process. The use cases which suggest linking or matching without more information and without a more circumspect decision process about the congruence of the two entities makes me nervous. Whether a quick match is possible depends on how well differentiated each label (or label visibly associated with a URI) is. We haven't yet seen how this discernment of points of agreement could be automated based on available information from a bib resource and an authority source.
Change management
Regarding c-27, it would be ideal to be alerted when changes are made. I would be afraid of those cases where a person is flipped to an incorrect identity based on a similar name. However, this probably just me thinking in terms of strings and not in terms of things. Of course, if the linked data is derived from an existing faulty MARC authority record that has, for example, conflated two persons, then there is the potential for an incorrect "flip" to take place. MARC-derived linked data is only as good as the original authoritative source.
For c-26 and c-27 we don't see this as an either or proposition. We would like a combination approach that allows us to specify categories of changes that could be approved automatically and others that can be reviewed first.
C-26 seems obvious - of course the display should match the authoritative source. What else would it match? I guess I don't really understand what the use case is. 
Left-anchored
Creating and maintaining entity identifiers is precise, context-sensitive work. When working with an entity, catalogers build and develop detailed entity attribute and relationship knowledge concerning the entity. Most NACO catalogers rely heavily upon left-anchored browse result lists to navigate entity databases. The prospect of keyword-derived search results makes me extremely uneasy.
Timeouts & Accuracy
The biggest issues encountered previously when using lookups was that they either never appeared (the search took too long or timed out) or the returned entities were in an order that was unhelpful or illogical (getting East New York when searching New York - and New York, NY not even appearing as an entity) . Those two fixes would be incredibly impactful. And thank you for all of your hard work!
Index structure
Please, please keep in mind that there are vocabularies where a singular/plural distinction indicates a real semantic difference. Stemming in searches should be an option, never a requirement.
URIs
As to the ones I left on the left -- URIs are at this point not a real consideration for me. I appreciate their (future) utility, but at the moment I am more concerned with textual strings because that is how I do my work, especially since the NAF, which I work in, doesn't allow non-Latin script variants as authoritative forms.
My current workflow does not include adding URIs so I left those unranked.
The reason linked data elements (URIs, etc.) are lower in importance right now is because we aren't in an environment that can use them so other parts of authority work have higher priority. If we had an environment where they were used they would be much higher in rank.
Survey - Ranking issues from survey structure
I considered nearly all of the stories to be extremely, very, or moderately important. It was impossible to put them in the slightly or not important categories.
please note that, once I had placed it within a ranking box, I did not rank a user story against other user stories in the same box.
This was a difficult task as I see many of the options as very similar -- I am not sure I see a significant difference between wanting to SEE broader/narrower terms and being able to STEP INTO broader/narrower terms, which may have pushed others 'down' the list.
I'm working late and it's been a long year. Sorry I haven't the energy to add more feedback. Also, being forced to enter at least 3 use case scenarios may have corrupted the value of what I entered in the last three boxes.
Many of the statements were very similar.
some of these two categories "MODERATELY IMPORTANT" and "SLIGHTLY IMPORTANT" would be good to have them if possible.
Because of the 6 user stories limit in each ranking box, some stories received lower ranking.
I didn't want to say anything was unimportant, but it wouldn't take the form until i did.
The maximum capacity criteria for each level of importance for ranking the features presented in this survey is limiting; I ran out of spots available for feature ranking and was unable to include some features I found important. 
The requirement that each category of importance must include 3 entries appears artificial; especially in the case of the final "Not at all important" category. There should be another "Important" category between "Slightly important" and "Not at all important."
Some of the use cases appear to be duplicative, or at least the distinction they are intended to capture is not clear. Examples: c-10 vs c-9; c-13 vs c-15. c-12 appears to describe a characteristic of the user rather than a capability of the system. It's not totally clear if the use cases describe human agents or potentially also machine agents used by cataloguers. The ranking given to some features, e.g. c-7, may differ according to which type of agent is involved. The restriction to maximum and minimum allowable entries in each category makes this exercise somewhat artificial. I left out c-10 and c-15 because they appeared duplicative and I needed to stay within the maximum of 6 entries per category, and I demoted c-6 to a ranking lower than I thought was warranted just to be able to submit a response to the survey.
Survey - UI issues
Even at the widest screen this format was extremely difficult to use--Once I got toward the end I couldn't get responses into the lowest boxes because they were below the screen and it didn't scroll down. Please don't use this format again.
Survey - Clarity of user stories
 I do not understand in (c2) what you mean by "label from an external authoritative sources" - property label? source label? It would help if you clarify what "standard indexing" is in (c5) Are c8 and c10 the same?
There were a lot where I didn't know what they meant and wanted to leave out. But the survey required at least 3 in each box, so I did the best I could.
the details of the search results were difficult to rank without seeing exactly what was meant (like the ranking questions)
The descriptions of some of the features in this survey are unclear. When read by different library participants at my institution, some descriptions meant different things to different people. That is unfortunate, and probably not helpful for those responsible for interpreting the Survey Results.
For c-12 it was unclear whether you were proposing to search the keywords or just see them on retrieval. C-8 seemed to have a strong value judgment statement that people know what they are looking for, that seemed inappropriate in the survey.
C-12 is not a cataloger need. It indicates what the cataloger knows. Can't rank it. 



  • No labels