The Karma data integration tool allows users to semantically model tabular data (i.e. spreadsheets) in a visual environment. This makes it easy to understand the best way to map data to ontologies and provides a visual comparison to the VIVO-ISF Ontology Diagrams This approach is especially useful for new VIVO adopters and those who prefer not to create and use their own scripts.
It's probably most common for data to be provided in spreadsheet format, which can be very simple to transform into RDF if each column of every row refers to attributes of the same entity, usually identified by a record identifier. The process becomes more complicated if different cells in the same row of the spreadsheet refer to different entities. This page includes example tabular datasets and screenshots of models created using Karma.
Modeling organizations, people, and positions
The following spreadsheet of organizations (one organization per row) is very easy to load into a VIVO:
You can readily imagine representing the information about each organization – id, name, contact information, and web site address – in additional columns. The Unique Resource Identifier (URI) used by VIVO to identify each organizational unit can be generated by using the org_ID and the institutional VIVO namespace. This is the starting point of creating the basic structure of your VIVO data.
The next step is to store the information about people affiliated with those organizational units, and finally, their positions within those units.
A spreadsheet of people data typically looks like this:
In this spreadsheet the person identifier is called UID (your institution will have a different name for this identifier) and is the unique identifier for a person at your institution, by which that person is uniquely identified in databases at the your institution. NETID (your institution will have a different name for this identifier) is another identifier for a person, often used as a username for logging into university systems. The other columns are self explanatory.
In Karma, the model of this data set containing people information is shown in the image below:
A spreadsheet of people’s positions in the organization will look like this:
As one can notice in this spreadsheet we use the UID of the person and the org_ID of the organizational unit to connect the person with that unit and create the RDF statement containing information about that person’s position.
The model of this data set containing information about people's positions within an institution is shown in the image below:
Modeling Academic Articles
A spreadsheet of academic articles is more complicated:
In this spreadsheet we have two important identifiers to connect the person with the article he/she authored: the articleID and the UID. Article ID can be any unique identifier assigned to the article. A model of this data set containing information about academic articles is shown in the image below:
Grants also involve making a number of connections, plus adding two classes that are likely not in your spreadsheet, vivo:AdministratorRole and vivo:PrincipalInvestigatorRole (and maybe vivo:CoPrincipalInvestigatorRole).
You must also assign a unique URI to these extra classes, shown below as URIs with the '_role' suffix in the gold columns. Below, the National Science Foundation has been modeled as vivo:GovernmentAgency. If you have a wide variety of funding organization types on a single spreadsheet source, you may want to create a separate spreadsheet and model for your funding organizations, or generalize the type to vivo:FundingOrganization. Likewise with the people modeled as vivo:FacultyMember below, you may generalize to foaf:Person.
Using PyTransform to create URI
Karma's PyTransform capability allows you to transform your source data using Python. A common use of PyTransform is to create additional unique URIs based off an existing one. The URIs with the '_role' suffix above were created using this Python code:
More information on PyTransform is available here within Karma's documentation.
One example when you will need to use the PyTransform option is to create the position URI in order to create the needed triples for representing each person position within his/her institution. As you can see above in the people's position data example there are few columns that you need the values from to create the correct triples for the position. In the drop down menu found on each column you select the PyTransform option and in the window you type:
return "http://vivo.northwestern/position/n"+getValue("UID")+"_"+getValue("org_ID")+"_"+getValue("position_type").replace(" ","_")
In this case the first part of the URI is the Northwestern University namespace which you need to change to your own namespace, such as http://vivo.school.edu/individual/n and then select values from three columns as shown above. Selecting values from three columns is necessary to identify positions uniquely, since most likely you have people at your institution that have multiple appointments and this allows you to create separate triples for each of their multiple positions within your institution.
Another example when you will need to use the PyTransform option is when you create the authorship URI for modeling the publications data as shown above. To do that you would want to select the PyTransform window found on the drop down menu on each column. Once you open the PyTransform window you type in the following:
The first part is your namespace and you would want to change that with your own namespace. The "ID" and the "uuid" are the names of the columns from which we have decided to create the authorship URI and they represent the article unique ID and the person unique ID respectively.