Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The magic black box doesn't yet exist, although the tools to work with RDF are improving all the time and the VIVO ontology has gained traction as a standard for information exchange on research networking. 
  • Furthermore, unless you are starting with an empty VIVO, the process preparing RDF for VIVO will have to have be able to query your VIVO to make sure it's not duplicating data already in VIVO, including people, organizations, and the content of the dataset at hand. When your data comes to you from several sources, alignment based on names alone is prone to errors including false positive matches and false negatives, leading to duplicate URIs for the same person, organization, or other entity.
  • Finally, that data will very likely not stay static.  Your data ingest methodology very quickly must also serve as a data updating and data removal methodology.
Yes, but
Okay, you've heard that before

You likely have some experience with ETL (extract, transform, and load) processes and you've recognized heard about these problems before. This is good – you are aware that while VIVO is the challenge you are taking on now, the challenges with VIVO are getting data into VIVO is not that different than with from other platforms. 

Unmasking the black box

You have at least three choices:

  • You can enter sample data into VIVO through its editing interfaces, export the data, and write your own scripts to produce data matching what you see. This sounds like a cop-out on the part of the VIVO community, but some people with a lot of ETL experience prefer to leverage tools they already know to produce a given target
    • The VIVO ontology team has developed a number of visual diagrams of the VIVO ontology at both overview and specific levels to help understand VIVO data, and may allow you to bypass or minimize time spent on sample data entry
    • Furthermore, there are an increasing number of open-source libraries for writing RDF with PHP, Java, and Python.
    • There are also commercially developed and supported tools including TopBraid Composer
    • If you know of other open source or commercial tools, please add links to them here)
  • You can use the VIVO Harvester, a framework for 

 

The VIVO Harvester

The VIVO harvester can be configured for a wide variety of tasks.

...