Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Planning a VIVO implementation requires significant effort.  A listserv has been created for those planning and conducting VIVO implementations.

A VIVO Implementation includes ...

 

...

Panel
bgColor#DDE3D0
titleBGColor#A9BB82
titleEstablish Governance

VIVO projects often cross institutional boundaries. To help make decisions and set direction, you'll need a project sponsor and one or more advisory groups, preferably including stakeholders such as researchers. These Outreach Contacts are likely roles to serve on advisory committees. <do we need a page with examples of advisory groups?>

...

Panel
bgColor#DDE3D0
titleBGColor#A9BB82
titleResource Identification

The size and makeup of VIVO teams depends on the VIVO implementation. See Hiring for VIVO projects for the types of resources typically needed. Additional resources are needed after the initial launch to maintain and possibly expand the system.

...

Panel
bgColor#CDE0E6
titleBGColor#7AB2C1
titleGather Use Cases

VIVO can be used to answer questions or generate reports about research. You may want to define those needs in the form of a use case. See: use case.

...

Panel
bgColor#CDE0E6
titleBGColor#7AB2C1
titleIdentify Stakeholders

Generally institutional office which are interested in using and contributing to a VIVO are those that collect or that use research information. This can be review committees, grants management offices, institutional repository managers, public affairs, fundraising bodies and others who need to find out about the research being conducted at the institution. See a sample project plan for a university.

...

Panel
bgColor#E4CECD
titleBGColor#BE7D7B
titleIdentify Potential Data Sources

From spreadsheets to external data repositories, there are a variety of potential data sources that can feed into your VIVO instance.  See VIVO Data - what and from where.  You’ll need to identify which data types and data sources are best aligned with the overall goals for your implementation. See Policy and planning question for VIVO data.  It can take some time to evaluate the data content and quality in order to forecast your ingest strategy.  See Data source specifications for implementation and Design of PubmedFetch.

...

Panel
bgColor#EFDCCC
titleBGColor#DEA577
titleLearn System Architecture

Become familiar with the top level aspects of the VIVO system. It helps to start with Installing VIVO release 1.8. You may also appreciate the information in the Software Architecture Overview.

  • You can estimate how much computer power you will need by comparing to other VIVO installations, in VIVO hardware and software requirements.
  • What about additional software? Most VIVO sites use only open-source software. The installation guide provides a list in A simple installation.
  • Some sites modify the VIVO software stack, either because they have special needs, or simply because they have expertise with different tools. A list of configuration options is available in VIVO 1.8 Installation options.

...

Panel
bgColor#EFDCCC
titleBGColor#DEA577
titleIdentify Customizations

Every site makes changes to VIVO. Some are as simple as changing the styling and the logo. Others add data types and properties to the ontology. Some add new pages to the application, or new functionality to existing pages. VIVO is made to be customized, but some changes are easier to accomplish than others, and you will need to take that into account. Many stylistic changes are easily done, as described in Changing the appearance of VIVO. Changes to the ontology are described in the Ontology Editor's Guide.

...

Panel
bgColor#DDE3D0
titleBGColor#A9BB82
titleBranding

Your VIVO can be customized using your institution's colors, logos, and other identity elements. See Branding Your VIVO to learn more. Also check out other implementations for examples.  

...

Panel
bgColor#DDE3D0
titleBGColor#A9BB82
titleFurther Define Scope

 VIVO can do many things.  Which things will be used at your institution?  What will you deliver?  To whom?  When?  See What VIVO Is and What It's NotSample One-Page VIVO Project Summary for a University. It may be helpful to show decision-makers a list of other research organizations which have implemented VIVO.

...

Panel
bgColor#DDE3D0
titleBGColor#A9BB82
titleRequest Data Feeds

 With scope defined, you'll know which data your VIVO will display. You'll need to identify the owners of the data you want to feed into VIVO. And you'll need to meet with them to ask for their data. Which data do you want? How will it look in VIVO? How often will you feed data into VIVO, and what are the options for obtaining the data? <Do we need a page/section that discusses how to engage with data owners?>

...

Panel
bgColor#CDE0E6
titleBGColor#7AB2C1
titleShare Prototypes and/or Existing VIVOs

In addition to installing a test VIVO and inviting users to edit profiles, another way to demonstrate the value and impact of having a VIVO is to show other organizations with a fully populated site. See: Sites implementing VIVO.

...

Panel
bgColor#E4CECD
titleBGColor#BE7D7B
titleMap Data To Ontologies

The value of the semantic web lies within the ability to define data using widely accepted ontologies.  See How to plan data ingest for VIVO. Understanding your data in order to accurately map to the VIVO-ISF ontology is an important step.  See VIVO-ISF Ontology. Mapping to an ontology begins as a conceptual design process on a piece of paper or using a diagramming tool such as Vue.  Your final data mapping will occur using a tool such as Karma.  Your mapping strategy will not only affect the searchability of your data, but also how easily it can be aggregated and reused by other systems.

...

Panel
bgColor#E4CECD
titleBGColor#BE7D7B
titleDocument Data Cleanup Strategy

Because linked data has a greater potential to be shared and aggregated, data consistency is crucial.  Depending on the quality and variability in your data sources, you may need to plan for data cleanup and/or data “munging” prior to loading into VIVO.  See How to manage data cleanup in VIVO. It is also important to standardize and document your data cleanup strategy. Cleanup can be done manually or semi-automated.  For a list of suggested cleanup tools see Name disambiguation and entity resolution.

...

Panel
bgColor#EFDCCC
titleBGColor#DEA577
titleEstablish Data Feeds

For most VIVO sites, the biggest challenge of a VIVO system is not VIVO itself. Instead, the challenge comes in populating VIVO with data from your institutional systems. Many pages in the VIVO wiki are concerned with data issues. Start with VIVO Data - what and from where (Example), and the pages linked to it. Data challenges are often particular to the individual institution, ranging from the practical (Data source specifications for implementation) to the political (Policy and planning questions for VIVO data).

...

Panel
bgColor#EFDCCC
titleBGColor#DEA577
titleDevelop Prototypes

  <Can we change this to "Develop Processes" ?> - You may need to develop your own programs and scripts for ingesting data, or you may be able to configure a more general tool to your own needs. In the VIVO community, the two favorite tools are the VIVO Harvester and Karma. You can view some tutorials on YouTube to learn more about using Karma with VIVO.

Implementation

Panel
bgColor#DDE3D0

Create Launch Strategy

 

Will your VIVO display all researchers in the initial launch, or just a subset? Will it contain lots of data or will you be adding data gradually? Whether you use a "broad and shallow" or "narrow and deep" strategy, this step often requires brainstorming with stakeholders.
Panel
bgColor#CDE0E6

Identify Power Users

Data contributors or other stakeholders may be able to help identify those users who are likely to be active participants and who edit and update their profiles regularly. These might be people who have easily adopted other systems in the recent past. The Site Admin==>User Accounts link can be sorted by the number of logins, which might also help identify power users.

Develop Training Materials

See examples of training materials at Duke University's Scholars@Duke

...

Panel
bgColor#E4CECD

Prepare Data Loads

Depending on how your technical roles are defined, this task will be shared between your data staff/domain experts and your developers.  This is when ontology mapping goes from conceptual diagrams to actual RDF generation.  Preparation of the data loads involves:

  • executing the data cleanup strategy
  • physically mapping the data to the ontology
  • verifying the RDF generated.

See XSLT Ingest Example and Using a different data store

Document Data Provenance

Maintenance of the data in your VIVO instance requires thorough documentation of your data flow.  Some of this information may be documented internally, while some of it will make sense to load in VIVO as part of the metadata. See Provenance Ontology.

Panel
bgColor#EFDCCC

Build Customized System

Add your customizations to VIVO. If you are only making small changes, mostly to appearance, you might add your changes directly to the VIVO distribution files. For larger customizations, you should learn Building VIVO in 3 tiers. This helps to organize your installation, and will make it easier to implement system upgrades when the time comes.

Test Performance

Before announcing your VIVO system to the public, you should test to be sure that it performs acceptably. If not, you might consider Use HTTP caching to improve performance, or the methods in Troubleshooting VIVO's Performance. <This needs work.>

Launch

Panel
bgColor#DDE3D0

Oversee Publicity Campaign

Communication is an important part of the launch strategy.  It's important to let the community know that VIVO's coming. Different audiences may need different messages, and different ways to engage with the project. See <Communication Strategies> for more information.

Implement Assessment Plan

How will you know when your VIVO implementation is successful? Consider a set of goals or a mission statement, endorsed by the VIVO governance groups. These goals should be tied to your assessment plan. See <Assessing the Impact of VIVO at Your Institution> to learn more about how to demonstrate the value of your VIVO.

Panel
bgColor#CDE0E6

Publicize VIVO

In addition to regular meetings with primary stakeholders, materials can be posted or mailed to those with potential interest. An example of a poster for that purpose is here.

Hold Trainings

Panel
bgColor#E4CECD

Route Data Cleanup Requests

From the cleanup strategy identified for the initial ingest, you will need to determine which are ongoing tasks and what the frequency will be going forward.  Once your VIVO instance is established, you will want to work with management and to identify the resources and workflow by which data maintenance requests will be addressed.  See Data Maintenance. You may have noticed inconsistencies or missing data during your initial ingest that could not be addressed in batch.  It is good to be aware of potential cleanup requests and have an established method for correcting the data.  See Monitoring for quality.

Support Data Provisioning

One of the fun parts of getting your data into VIVO is finding out all of the creative ways it can be reused in other systems.  See Finding VIVO Data with the University of Florida Public SPARQL Endpoint. Setting up a SPARQL endpoint is one way to write customized queries for data consumers.  See Setting up a VIVO SPARQL Endpoint. SPARQL query results can be run periodically and exported in several common formats such as .csv and .xml.  For example queries see Rich export SPARQL queries. Web developers may find the VIVO widgets to be a useful way to consume VIVO data as JSON.

Panel
bgColor#EFDCCC

Provide System Support

As with any IT project, you should expect VIVO to require ongoing support. This includes things like backups, security checks, upgrades to operating systems, etc.

Maintenance

Panel
bgColor#DDE3D0

Contribute to VIVO community

Once the dust settles, you'll want to share your experiences and best practices with the VIVO community. Follow other VIVO sites on social media, and create your own accounts to share your news. Join a VIVO task force to create or improve something. We need your help to make the VIVO community stronger!

Panel
bgColor#CDE0E6

Find New Collaborators

Engaging a research organization's community will depend on identifying new collaborators who might either contribute to or use the VIVO (or both). 

Hold User Meetings

Panel
bgColor#E4CECD

Manage Ontology Updates

Ontologies can be updated to add or remove terms as a domain becomes better defined.  Ontology changes can be initialized from within your institution (ex. an identifier specific to your institution) or externally (ex. a deprecated term identified by an international standards organization).  The ontology change can be adopted at the level of the the VIVO ontology, or it can be a change done locally within your institution, aka a “local extension”.  For an example, see Ontology Extensions -- Duke.  Local extensions can easily become duplicate ways of defining the same type of information, so it can be helpful to collaborate with the VIVO ontology community when deciding whether or not you need to create a local extension. To the extent that it is relevant to the broader VIVO community, it is important that over time, local ontology extensions get incorporated into the VIVO core ontology.  See Proposed ontology development workflow.

Add New Data & Sources

Data curation is an ongoing task and including new data types or new data sources will most likely be an aspect of maintaining your VIVO instance.  You may find that your initial implementation draws more interest from owners of data repositories you hadn’t considered.  You may want to prioritize new data sources based on the data quality and volume, as well as the number of new ontology mappings required.  As an example, see Activating the ORCID integration.

Panel
bgColor#EFDCCC

Implement System Upgrades

The VIVO team issues a new release about every year, with occasional extra releases to fix problems. For the most part, upgrading is straightforward. If there is a change in the VIVO ontology, the new release will include an automatic script to translate your existing data. Again, it is not unusual to find that the largest task is making changes in your data ingest processes. Most VIVO sites try to adopt new releases as soon as possible, to take advantage of new features or improved performance.

Develop New Features

You should also plan to develop new customizations for your VIVO installation. As your users gain experience, they will likely request new data and new displays. In the best case, these changes may be something that you can contribute to the VIVO community, so others may benefit from your work.

Major concepts in VIVO to get you started

We suggest that anyone heavily involved with implementing the project be familiar with these terms.   Click here for the full glossary.

Duraspace - The DuraSpace organization has sustained and improved open technologies that are tested and durable since 2009. Working with global communities of practice DuraSpace is actively involved in projects that use DuraSpace technologies for access, management, and preservation of digital content.DuraSpace collaborates with open source software projects, academics, technologists, curators and related commercial partners to create innovative, interoperable technologies and open standards and protocols that share an interest in preserving digital scholarship and culture. source

Open-source -Why would anyone want to give away the software program that they have sweated blood and tears over? And how do they give it away? Moreover, what happens after the software has been released to all and sundry? Who looks after it and produces new and improved versions? To answer these questions we must consider open source as a software development methodology, and in the context of community building.  Open source is developed by a number of people who may have no connection to one another apart from their interest in the open source project. Consequently, the software development methodologies adopted are not the same as those found in closed source development projects.  Since open source is developed by a group of individuals with a shared interest in the project this community of users and programmers is key to the advancement of any open source project. source

OWL - The Web Ontology Language (OWL) is a semantic markup language for publishing and sharing ontologies on the World Wide Web. Where earlier knowledge representation languages have been used to develop tools and ontologies for specific user communities (particularly in the sciences and in company-specific e-commerce applications), they were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. source

Protégé -  a free open source ontology editor developed by Stanford Center for Biomedical Informatics Research at the Stanford University School of Medicine. WebProtégé provides the following features:

  • Support for editing OWL 2 ontologies
  • A default simple editing interface, which provides access to commonly used OWL constructs
  • Full change tracking and revision history
  • Collaboration tools such as, sharing and permissions, threaded notes and discussions, watches and email notifications
  • Customizable user interface
  • Customizable Web forms for application/domain specific editing
  • Support for editing OBO ontologies
  • Multiple formats for upload and download of ontologies (supported formats: RDF/XML, Turtle, OWL/XML, OBO, and others)
    source

Resource Description Framework (RDF) --The RDF language is a part of the W3C's Semantic Web Activity. W3C's "Semantic Web Vision" is a future where web information has exact meaning, can be understood and processed by computers, and computers can integrate information from the web.  RDF was designed to provide a common way to describe information so it can be read and understood by computer applications. source

Semantic Reasoner - A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description language.  A reasoner is a key component for working with OWL ontologies. In fact, virtually all querying of an OWL ontology (and its imports closure) should be done using a reasoner. This is because knowledge in an ontology might not be explicit and a reasoner is required to deduce implicit knowledge so that the correct query results are obtained. source1 source2

Semantic Web -- The Semantic Web, Web 3.0, the Linked Data Web, the Web of Data…whatever you call it, the Semantic Web represents the next major evolution in connecting information. It enables data to be linked from a source to any other source and to be understood by computers so that they can perform increasingly sophisticated tasks on our behalf. source

Solr -- Apache Solr is an open source search platform built upon a Java library called Lucene.  Solr is a popular search platform for Web sites because it can index and search multiple sites and return recommendations for related content based on the search query’s taxonomy. Solr is also a popular search platform for enterprise search because it can be used to index and search documents. source

SPARQL -- SPARQL (pronounced "sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation, and SPARQL 1.1 in March, 2013. source

Triple - A Triple is the minimal amount of information expressible in Semantic Web. It is composed of 3 elements: 1) A subject which is a URI (e.g., a "web address") that represents something. 2) A predicate which is another URI that represents a certain property of the subject. 3) An object which can be a URI or a literal (a string) that is related to the subject through the predicate. source

Triplestore - Triplestores are Database Management Systems (DBMS) for data modeled using RDF. Unlike Relational Database Management Systems (RDBMS), which store data in relations (or tables) and are queried using SQL, triplestores store RDF triples and are queried using SPARQL.  A key feature of many triplestores is the ability to do inference.  The location for storing VIVO data. The default VIVO installation calls for a MySQL database to hold the information in VIVO but there are other alternative storage options both established and under exploration. source

Panel
bgColor0xff3388