You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

The Karma data integration tool allows users to semantically model tabular data (i.e. spreadsheets) in a visual environment. The makes it easier to understand the best way to map data to ontologies and provides a visual comparison to the VIVO-ISF relationship diagrams. This approach is especially useful for new VIVO adopters and those who prefer not to create and use their own scripts. 

It's probably most common for data to be provided in spreadsheet format, which can be very simple to transform into RDF if each column of every row refers to attributes of the same entity, usually identified by a record identifier. The process becomes more complicated if different cells in the same row of the spreadsheet refer to different entities. This page includes example tabular datasets and screenshots of models created in Karma.

Modeling departments, positions, and people

The following spreadsheet would be very easy to load into a VIVO describing organizational units:

You can readily imagine storing the information about each organizational unit – id, name, and the Unique Resource Identifier (URI) that can be generated by using the org_ID and the institutional VIVO namespace. This in fact is the starting point of creating the basic structure of your VIVO data. The next step is to store the information about people affiliated with those organizational units and their positions within those units.
A spreadsheet of people data will look like this:

 

In this spreadsheet the UID (your institution will have a different name for this identifier) is the unique identifier for a person by which that person is uniquely identified in all databases at the specific institution. NETID (your institution will have a different name for this identifier) is another identifier for a person. The other columns are self explanatory. The model of this data set containing people information is shown in the image below:

 

 

A spreadsheet of people’s position in the organization will look like this:

As one can notice in this spreadsheet we use the UID of the person and the org_ID of the organizational unit to connect the person with that unit and create the RDF statement containing information about that person’s position. 

The model of this data set containing information about people's positions within an institution is shown in the image below:

 

Modeling Academic Articles

A spreadsheet of academic articles, however, would be more complicated:

In this spreadsheet we have two important identifiers to connect the person with the article he/she authored: the articleID and the UID. Article ID can be any unique identifier assigned to the article. A model of this data set containing information about academic articles is shown in the image below:

Modeling Grants

Grants also involve making a number of connections, plus adding two classes that are likely not in your spreadsheet, vivo:AdministratorRole and vivo:PrincipalInvestigatorRole (and maybe vivo:CoPrincipalInvestigatorRole).

You must also assign a unique URI to these extra classes, shown below as URIs with the '_role' suffix in the gold columns. Below, the National Science Foundation has been modeled as vivo:GovernmentAgency. If you have a wide variety of funding organization types on a single spreadsheet source, you may want to create a separate spreadsheet and model for your funding organizations, or generalize the type to vivo:FundingOrganization. Likewise with the people modeled as vivo:FacultyMember below, you may generalize to foaf:Person.

Using PyTransform

Karma's PyTransform capability allows you to transform your source data using Python. A common use of PyTransform is to create additional unique URIs based off an existing one. The URIs with the '_role' suffix above were created using this Python code: 

return getValue("AdminDepartmentID")+'_role'

More information on PyTransform is available here within Karma's documentation.

  • One example when you will need to use the PyTransform option is to create the position URI in order to create the needed triples for representing each person position within his/her institution. As you can see above in the people's position data example there are few columns that you need the values from to create the correct triples for the position. In the drop down menu found on each column you select the PyTransform option and in the window you type: 
    return "http://vivo.northwestern/position/n"+getValue("UID")+"_"+getValue("org_ID")+"_"+getValue("position_type").replace(" ","_") 
    In this case the first part of the URI is the Northwestern University namespace which you need to change to your own namespace (http://vivo.school.edu/n) and then select values from three columns as shown above. Selecting values from three columns is important since most likely you have people at your institution that have multiple appointments and this allows you to create separate triples for each of their multiple positions within your institution. 
  • Another example when you will need to use the PyTransform option is when you create the authorship URI for modeling the publications data, or the academic article data as shown above. To do that you would want to select the PyTransform window found on the drop down menu on each column. Once you open the PyTransform window you type in the following: return "http://vivo.northwestern.edu/n"+getValue("ID")+getValue("uuid").replace(" ","_")
    The first part is your namespace and you would want to change that with your own namespace: http://vivo.school.edu/n. The "ID" and the "uuid" are the names of the columns from which we have decided to create the authorship URI and they represent the article unique ID and the person unique ID respectively. 

 

 

  • No labels