What is Annif? Why did we experiment with it?
Annif (http://annif.org/) is a tool built using natural language processing and machine learning techniques for recommending subjects for a document after being fed a particular controlled vocabulary. For the SMASH! Phase of our work, we experimented with retrieving related entities given a user query. We wanted to use ANNIF to use the Library of Congress Subject Headings and the subject headings used within the library catalog metadata to recommend subject headings given a user query (instead of an entire document).
Annif resources, data and algorithms used for this project
We followed the documentation at https://github.com/NatLibFi/Annif and https://github.com/NatLibFi/Annif-tutorial to understand how to set up the system and what are the data and metadata requirements. The steps documented at https://github.com/NatLibFi/Annif-tutorial/tree/master/exercises were invaluable!
Algorithm: We configured Annif to use its built-in TFIDF (http://www.tfidf.com/) algorithm for this phase, although ANNIF allows for using a combination of algorithms. If given enough additional time, it would be useful to try out the combination of algorithms to suggest subject headings.
Data:
<http://id.loc.gov/authorities/subjects/sh00000231> Antique and classic aircraft |
{ "fulltitle_display":"Using R for item response theory model applications", "subject_display":["Item response theory", "R (Computer program language)"]}, |
Installation, setup, and deployment on dev vm
Suggestions example
Results example: Searching for “vienna architecture” with a limit of 10 results yields
{ "results": [ { "label": "International style (Architecture)", "score": 0.47264978289604187, "uri": "http://id.loc.gov/authorities/subjects/sh85067451" }, { "label": "Architectural practice, International", "score": 0.47264978289604187, "uri": "http://id.loc.gov/authorities/subjects/sh85006606" }, { "label": "Architecture--Philosophy", "score": 0.4689684808254242, "uri": "http://id.loc.gov/authorities/subjects/sh2007101285" }, { "label": "Quality (Aesthetics)", "score": 0.4567379951477051, "uri": "http://id.loc.gov/authorities/subjects/sh94009536" }, { "label": "Four elements (Philosophy)", "score": 0.447782963514328, "uri": "http://id.loc.gov/authorities/subjects/sh85051080" }, { "label": "Black in interior decoration", "score": 0.4383489787578583, "uri": "http://id.loc.gov/authorities/subjects/sh2011002117" }, { "label": "Architecture and literature", "score": 0.41448304057121277, "uri": "http://id.loc.gov/authorities/subjects/sh85006888" }, { "label": "Architects--Professional ethics", "score": 0.41201311349868774, "uri": "http://id.loc.gov/authorities/subjects/sh85006572" }, { "label": "Architecture and philosophy", "score": 0.40943264961242676, "uri": "http://id.loc.gov/authorities/subjects/sh95008375" }, { "label": "Architecture and technology", "score": 0.40000787377357483, "uri": "http://id.loc.gov/authorities/subjects/sh97005373" } ] } |