Link to original doc

What is Annif? Why did we experiment with it?

Annif (http://annif.org/) is a tool built using natural language processing and machine learning techniques for recommending subjects for a document after being fed a particular controlled vocabulary.  For the SMASH! Phase of our work, we experimented with retrieving related entities given a user query. We wanted to use ANNIF to use the Library of Congress Subject Headings and the subject headings used within the library catalog metadata to recommend subject headings given a user query (instead of an entire document).  

Annif resources, data and algorithms used for this project

We followed the documentation at https://github.com/NatLibFi/Annif and https://github.com/NatLibFi/Annif-tutorial to understand how to set up the system and what are the data and metadata requirements.  The steps documented at https://github.com/NatLibFi/Annif-tutorial/tree/master/exercises were invaluable!

Algorithm: We configured Annif to use its built-in TFIDF (http://www.tfidf.com/) algorithm for this phase, although ANNIF allows for using a combination of algorithms.  If given enough additional time, it would be useful to try out the combination of algorithms to suggest subject headings. 

Data: 

<http://id.loc.gov/authorities/subjects/sh00000231>     Antique and classic aircraft

{

        "fulltitle_display":"Using R for item response theory model applications",

        "subject_display":["Item response theory",

          "R (Computer program language)"]},



Installation, setup, and deployment on dev vm

Suggestions example

Results example: Searching for “vienna architecture” with a limit of 10 results yields

{

  "results": [

    {

      "label": "International style (Architecture)",

      "score": 0.47264978289604187,

      "uri": "http://id.loc.gov/authorities/subjects/sh85067451"

    },

    {

      "label": "Architectural practice, International",

      "score": 0.47264978289604187,

      "uri": "http://id.loc.gov/authorities/subjects/sh85006606"

    },

    {

      "label": "Architecture--Philosophy",

      "score": 0.4689684808254242,

      "uri": "http://id.loc.gov/authorities/subjects/sh2007101285"

    },

    {

      "label": "Quality (Aesthetics)",

      "score": 0.4567379951477051,

      "uri": "http://id.loc.gov/authorities/subjects/sh94009536"

    },

    {

      "label": "Four elements (Philosophy)",

      "score": 0.447782963514328,

      "uri": "http://id.loc.gov/authorities/subjects/sh85051080"

    },

    {

      "label": "Black in interior decoration",

      "score": 0.4383489787578583,

      "uri": "http://id.loc.gov/authorities/subjects/sh2011002117"

    },

    {

      "label": "Architecture and literature",

      "score": 0.41448304057121277,

      "uri": "http://id.loc.gov/authorities/subjects/sh85006888"

    },

    {

      "label": "Architects--Professional ethics",

      "score": 0.41201311349868774,

      "uri": "http://id.loc.gov/authorities/subjects/sh85006572"

    },

    {

      "label": "Architecture and philosophy",

      "score": 0.40943264961242676,

      "uri": "http://id.loc.gov/authorities/subjects/sh95008375"

    },

    {

      "label": "Architecture and technology",

      "score": 0.40000787377357483,

      "uri": "http://id.loc.gov/authorities/subjects/sh97005373"

    }

  ]

}