Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When data systems already exist, one of the first challenges will be to find common identifiers to link them. Most organizations have evolved an identifier for people that can be shared publicly, whether the institutional email address or another number other than a social security number in the U.S. or other confidential government ID. It's remarkably hard to identify who should be in a VIVO, and surprisingly few large organizations even have an accurate or consistent scheme for identifying departments and organizations, especially somewhat transient units such as research centers.

...

Gaining access to data and permission to reuse it.

The mere fact that data exist does not assure that they will be made available to you, however. It will be important to make contact with the owners or stewards of the data desired for VIVO, and it may not be trivial for them to make data available. It's common for public and private data to be intermingled in systems of record that have been built for functions such as payroll, performance evaluation, or financial accountability. Gaining access to data and permission to display it on the Web often requires building bridges to other units and catching the ear of people with the authority to support your requests.  Some of the most successful VIVOs involve closely-coordinated collaborations among central administrators, IT staff, the library, and the research administration office.

Mapping data into VIVO

One of the challenges for the VIVO community is that while many organizations have similar kinds of data, there are a myriad ways in which employment and affiliation, grants, publications, courses, facilities, and other useful data for VIVO are stored and made available for reuse. Very often the data need some enhancement  and structuring in order to work with VIVO's very granular approach to storing data.

Three forces are at work to make the task harder than it may at first seem.  First, we've already mentioned the challenge of finding reliable and unique identifiers on people, organizations, courses, journals, grants, places, and even potentially events.  Many systems rely solely on the alignment of text strings to associate one data point with another, and if (as is common) a person or department may be recorded with several name variants in different systems or at different times, VIVO will have only partial information for each variant and the duplication of names will become very evident.

...

 

 

Data do not remain static.

It's easy to focus on a one-time data load, especially in a proof of concept or pilot project. Putting up any kind of information system, especially one with as many different kinds of data as VIVO, requires making a plan for handling updates – both full and partial data removals as well as additions. Many data sources don't maintain the date of last modification so a complete scan is needed to detect change. In some cases data layers can be replaced wholesale in VIVO because they interact or interleave very little with other datasets, which simplifies updating.

...

Starting small and not taking on too many different kinds of data allows you to familiarize yourself with what are often new concepts and new tools. Data are often dirtier than advertised to be, and more will be left to you to clean up in the process of importing into VIVO. If you need to hire technical help, it may take longer than anticipated to find a qualified person familiar with Semantic Web technologies than for routine web development.

Reaching out to engage your peers

For more information

...