Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Request a New Dataset for QA


This is a guide for LD4P2 Partners (cohort and PCC affiliates included) on how to request a new dataset to be added to for QA so corresponding lookups can be added to Sinopia (the LD4P2 supported BIBFRAME editor).

Email Steven Folsom (sf433 @ cornell dot edu) with related questions or comments on how to improve this guide.


1.) Make sure the dataset is not already in QA

Go to the Authorities tab on (a temporary site until is live) to see if the dataset of interest is already available in QA. Please, also consult the summary page (NEED LINK to page E. Lynette Rayle is creating) of datasets being supported by the LD4P2 through QA vs. those being supported by type ahead searching already available in the BIBFRAME editor.

2.) Identify the new dataset

Identify the dataset and gather Gather information about how to acquire data dumps and/or API access and the dataset's homepage URL. (You will be asked for this information along with the dataset's homepage URL when making a formal request as GitHub issue, see Step #5 below.)

3.) Decide


how contextual information should be


in the lookup service

As you might know by now, QA has the ability to provide contextual information about an entity (not just preferred labels) during the look-up experience.   In order to do so, decisions need to be made about how to index the dataset's RDF. Using this spreadsheet, add a tab for the new dataset. For each new tab, please use the following column headers and value guidelines (N.B. see the existing LCGT tab as an example): 


URI for the class being describedURI for the property or property path to get to the information to be indexed in QAUse an 'X' to mark when this data should be used to search against. N.B. some data is important to display, but perhaps wouldn't be useful to search against in a lookup.Use an X to mark when the value should be displayed. Include a label for the field. This may be the property name in the property path column or you may decide another term is more appropriateIf applicable provide notes on whether a particular property path should weigh heavier on the search rankings.

4.) Add Test parameters using a YAML file

In order to make sure Plenty more to add before ready to promote..sure the QA search behavior (recall and relevancy) are meeting expectations, QA uses YAML to define test parameters. These parameters include being able to declare for a particular text string searched using QA that the results should include a particular resource (identified by a URI) and what position the resource should be found. For example, when searching 'Casebooks' against LCGFT, should be in the top 10 result.

(Definition of YAML Keys coming soon.)

    a.) Using the YAML key definitions above, create a YAML file for your dataset using a text editor. Save the file with the file extension .yml. and upload to

    b.) Alternatively, from the same page, create the YAML file using the GitHub "Create new" file feature.

5.) Create an issue formally requesting the new dataset so the request can be prioritized and tracked.

    a.) From, create an issue by clicking on "Get started" for the Request a New Dataset for QA. You will be asked to provide/confirm the following:

    [ ] Identify the data source: (Include the Data Source Name and its homepage URL)
    [ ] Add a new tab and indexing information for the data source to the following spreadsheet:
    [ ] Add a YAML test file to; please provide here a link to the YAML file related to this request.

Process for Prioritizing Requests (Coming Soon)

Please note, all requests for new data sources in QA will be prioritized by the LD4P2 project. Due to time restrictions there is no guarantee that all requests will be added to QA during the lifetime of the LD4P2 grant; regardless of resources it is still useful to know which datasets the community would find useful in such a lookup service.