Theme: Search Engine Optimization for researcher profiles


Growth Hacking 101 for Research Networking Sites
UCSF Profiles ( receives 107,000 visits a month, and researcher profile pages get regularly linked to from sources like the New York Times, the BBC, CNN, and NPR.
76% of that traffic (81,000 visits/month) comes from a single source: Google.
Anirvan Chatterjee from CTSI at UCSF will share the 5-part strategy that turned UCSF’s RNS into a heavily-used resource:
    1. using web analytics to measure baselines
    2. putting researcher profile pages front and center
    3. ensuring your site’s getting indexed by search engines
    4. thinking like Google—tweaking URLs and metadata to attract search engine users
    5. getting inbound links to establish reputation
While there’s no silver bullet techno-fix to growing traffic, some of these local best practices have been embedded into the core Profiles RNS product out of the box. We’ll hear about how that’s worked — and which parts of the traffic equation can’t be automated.


  • have grown traffic roughly 20-fold since first launch - have learned what can do and what not to do. No magic -- all the pieces work together

  • launched in October 2010 with lots of publicity -- started at 5,000 hits a month but has grown to 100K hits/month

    • 37% CA

    • 30% other USA

  • 83% from search engines

  • Growth Hacking 101

    • measure baselines with analytics -- “Web Analytics 2.0” by Kaushik

    • put people pages front and center

    • register with Google Webmaster Tools

    • segment on-campus vs. off-campus traffic

      • 15% new visitors last month overall, but is 72% of on-campus users while only 20% of outside users

    • ignore your home page -- users don’t land there -- 2.1% of Profiles users start on the home page, and 98% totally avoid the home page

      • and only 21% of 2+ minute users start on the home page

      • growth of home page views is pretty static while the rest of the traffic is growing

    • people type a name into Google and want to go to the person’s profile

    • so have cleaned up those profile pages

      • more inviting, more data fields, bigger photos, etc -- look at good examples

    • get indexed -- make sure search engines can see it

      • have a dynamic site map of all the pages on your site -- check with Google webmaster tools

      • make sure you’re not blocking via robots.txt (

    • tweak URLs and page snippets -- search results show title, URL, and snippet

      • customize the URL to make it more appealing -- e.g.,

      • make the snippet more readable -- not random pieces of text

      • make the <title> on profile pages so it’s short and globally unique

      • the <meta name=”description”> something like “Jane Doe’s profile, publications, research topics, and co-authors

      • with can add in a line of professional metadata

      • make pretty URLs -- see a lot of researchers putting that URL in their email signature -- feels more personal

      • and the pretty URL should be the “real” URL, not just a redirect

    • Get inbound links -- self-reinforcing

      • doing a lot with campus news department so every news story mentioning a researcher includes a link to the profile

      • and have the link in the directory

      • get links to Profiles on departmental pages

      • work with departments to give them a feed -- or just link by saying “view on UCSF Profiles”

      • include links in narrative bios

    • create APIs for people to use

      • document them on a developer-centric website

      • online discussion group

      • outreach to campus groups

      • ask for attribution via a link back to Profiles -- you save them time and money and they give you links

      • over 2 dozen sites now using Profiles data and linking back

    • media mentions also link to individual profiles (instead of what is often a pretty crappy lab page)



Do some people/departments see as competition for their web space?

  • Profiles had some buy-in but no top-down direction to use it -- work closely with departments

  • sometimes makes the Profiles higher than the department page, but few of them measure

  • helps to be giving them data, and data that is clean

  • E.g., UCSF School of Pharmacy partnered with the Profiles team and re-used, and worked out editing in Profiles -- don’t have to push it on them


Pretty URLs?

  • VIVO has the URI vs. the display URL -- don’t want to use our URIs as links on the display pages since appears then to have 2 URLs for the same thing

  • The logistics -- how to assign a pretty URL to two people with the same first and last name

    • could have preferred URL

  • Eric -- still have the numeric URIs in Profiles, same as VIVO

    • do content negotiation to display the HTML page with the pretty URL

    • the links within pages are linked by URI, not URL

    • Profiles had a pretty URL /websites/profiles/name -- better to shorten

    • have a strategy to avoid name collisions that does a good job -- if 2 Eric Meeks, one of whom has middle initial

      • have shared that algorithm -- pretty easy to get uniqueness without having to deal with

      • but is worth doing

    • BU has taken the UCSF code and modified it but put it into practice

    • Jim -- do you retire the pretty URL?

    • Eric -- the same case can be made for persistent URLs as well as persistent URIs

    • Anirvan -- a lot of people have built this capability into

  • having 2 URLs pointing to the same content

    • if look at the accept headers -- if wanting XML or JSON, gets the data; otherwise direct to the pretty URL

    • if use “link rel=canonical” to the pretty URL on the ugly URI “page” -- in the header -- do view source and do a search for canonical

      • if a robot has scraped a URL with a bunch of query arguments

  • do you do this for grants, equipment, etc.?

    • No -- people come to Profiles overwhelmingly for people


Patrick -- Scholars@Duke has encountered some resistance to competition -- stronger in the humanities than in medical world, perhaps?

  • Eric -- we’re biomedical and the Profiles system does a good job for people. Resistance got smaller especially when offered background

  • But had been other systems at other schools that used the word “profiles” so is a more confusing

  • To be a successful research networking tool have to be a good website, and isn’t easy to do

  • Duke has taken an aggressive approach with the widgets, but don’t underestimate how important it is to make a successful website to be a successful tool


Jim -- did this presentation at an IFest and we haven’t done a lot yet -- thanks for helping to raise our vision up


Anirvan -- worked in eCommerce and was excellent training


Jim -- Nate Prewitt implemented microformats at the Hackathon


Anirvan -- will the descriptions change? A short, one-line item

  • Will be happy to share the slides via SlideShare


Paul -- do you have before and after stats on the use of tags?


Theme: Inferencing and Reinferencing in VIVO

Guest presenter: Brian Lowe

  • Introduction

    • What is inferencing?

      • Brian -- will start at a basic level

      • A broad topic that can be used for a lot of things

      • how we use it in VIVO is very simple

      • basically, if you make a certain assertion, an inference is the ability to then say that other statements are true

      • e.g., if you assert that something is an Academic Department, a reasoner can infer that it is also a foaf:Organization

      • not very exciting, but very useful -- when rendering a page of all people, don’t have to cycle through all the different type assertions to figure out which types map up to foaf:Organization, but can just look for the statements of rdf:type foaf:Organization that have been added by the inferencing process

      • these extra triples get added into a separate graph in the Jena triplestore - called being “materialized” in to the graph - called the inference graph

      • allows you to get what you’re looking for more quickly

      • in the biomedical world you may be doing much more advanced reasoning, such as working off of associations with genes to determine what diseases may be implicated by certain symptoms in organisms

      • Similar to how Solr is used… inferencing makes it faster to access information... whenever triples are added through VIVO interfaces (user interface editing, API ingest) the reasoner is used to infer additional triples

        • If you only have a single RDF type triple, and you would need more complex queries to browse or search in VIVO -- inferred RDF type triples simplify these queries

        • Solr is fundamentally doing this in a very different way, however -- it’s flattening the information in the graph into “documents”

      • A separate piece of reasoning in VIVO related to the ontology

        • Used in part to determine what properties should appear on a page for a foaf:Person than a bibo:Document

        • The code that decides what properties to display on the page is dependent on the TBox reasoning -- the entire ontology as loaded into memory

        • It’s sort of arbitrary -- in OWL it’s possible to make an assertion about anything, including statements that are inconsistent with the ontology

        • But OWL and RDF are not like a schema language

      • One chunk of code that uses the Pellet reasoner

        • dates back to the first fully semantic version of VIVO that had an in-memory ontology model - all queries ran against the in-memory store but needed to be able to persist that on disk (in a database, via Jena)

        • had to move away from this when memory became a problem

        • was logical to employ a reasoner that would work in memory

          • a listener to the triples added or removed from the model would be sent over to the Pellet reasoner and its knowledge base

          • as those inferences were added or removed via Pellet, those were synchronized with the database store in the background

            • when editing, you would see your direct assertions first, and then see additional statements added by Pellet in a few seconds

          • we limited what we sent Pellet to do -- e.g., not data property reasoning

        • When we moved inferencing of all the data out of memory, we retained the reasoning of the ontology itself

          • what is a subclass of other classes

          • because OWL is description logic, set up to define a class and what properties it has, the reasoner can then figure out where that class fits in the class tree

          • Pellet does that reasoning using all the axioms, many of which are now added in other parts of the ISF -- that piece is still necessary

          • But the triples about all the data can’t be sent to Pellet because there isn’t enough memory for that

          • We created the VIVO simple reasoner as a limited way to get in all the triples that VIVO depends on to accurately render pages -- looks at the types on an individual, for example rdf:type vivo:AcademicDepartment, and adds in the extra types such as foaf:Organization, and that incremental reasoning is done right away so that the page displays correctly right away

            • don’t ever see the statement in its uninferred state

            • just the statements that VIVO depends on get inferred right away

          • Pellet does real OWL reasoning on the ontology itself, while the simple reasoner does only the inferencing necessary for VIVO to display correctly -- a hybrid system

      • Kind of different from the way some other applications use inferencing -- e.g. in a medical application, it may use inferencing to look for candidates of genes that might be implicated in some disease.

    • What is a reasoner and how is this used in VIVO?

      • 2007 when running everything in memory -- fast but size was limited

      • Pellet is a complete OWL reasoner

      • Pellet still classifies the ontology (the triples in the T-Box) -- not really taken advantage of in the original VIVO ontology, but other OWL ontologies not designed in the same way including some parts of ISF

      • when moved away from everything in memory… no longer sending all individuals (people, orgs, pubs, etc) to Pellet, just the ontology management

      • newer VIVO SimpleReasoner (class name?) for a hybrid solution -- a limited way of getting in particular triples that VIVO depends on to accurately render pages

    • What are the different types of inferencing?

    • What is reinferencing?

      • When you start a reinferencing, Pellet is not doing anything, no changes in T-box  -- just the Simple Reasoner

      • It’s just like rebuilding your search index -- the idea of a listener infrastructure is that hopefully you only need a full re-indexing or re-inferencing if your database got corrupted somehow

        • but with the Harvester, a lot of changes were introduced straight into the database, without VIVO being able to listener, so this became something that had to be done much more frequently than we originally thought

      • the simple reasoner has never been optimized to reason on the entire database --

        • does new inserts instantly

        • but when statements are deleted, it kicks off a small batch job to accomplish the deletion

        • the effort to date has been to optimize the experience of interactive editing rather than support large batch operations

        • redoing the entire thing for the entire database looks to the simple reasoner that the entire database has been redone from scratch

          • so the simple reasoner adds these triples into a temporary graph and at the end tries to reconcile that against the main application graph

          • it does a lot of scut work that it doesn’t need to do if the complete reinferencing were re-designed to to that entire replacement

        • the reinference of your database does not use Pellet

          • we assume your database is frozen at that point and is responding to what’s there -- Pellet is just at a temporary steady state with it’s knowledge of which classes are subclasses of which other classes and which properties are sub-properties -- that just sits in the TBox model and is not changed

        • Patrick -- if you had changed the ontology since the last re-inferencing, would Pellet know about it?

          • Brian -- Pellet would reflect the changes to the ontology in the model the simple reasoner users, but if you change the ontology in the middle of reinferencing the data, would not be sure of the results

          • Patrick -- we put in our own extra triples so don’t need the simple reasoner

          • Brian -- would be helpful to be able to configure more granularly

            • e.g., you can now switch on or off whether it pays attention to sameAs statements

            • could potentially turn off inverse property reasoning if you want to insert your own inverse statements

            • that would give people more flexibility

        • Brian -- has opened some issues for 1.8 and hopes to take care of them in the next couple weeks

          • doesn’t look too bad to make some headway

          • two basic approaches

            • 1 get rid of the temporary rebuild graph that the simple reasoner infers the current state in

            • if your base assertions haven’t changed very much it will only have to make the writes for what is change rather than doing everything and having to copy it over

            • and 2, just batching up the inserts should have a relatively big effect

              • because it was set up to slot in new triples one-by-one, should be a relatively simple tasks to put statements together into chunks that should yield a significant improvement

              • on his dev machine was getting a fixed time of 25 ms to do an insert of one tirple

              • now in batch each extra triple added into the batch only adds ⅓ or ½ millisecond

            • will be pursuing those two options can thinks that will help a lot with 1.8

            • still more that could be done

        • Jim -- are inferencing and reinferencing the same thing?

          • if I’m using the SPARQL update API and am adding 300K triples, that is inferencing -- not a full re-inferencing

          • The use of the temporary model only applies to re-inferencing, so we won’t get advantage from removing it

          • will the batching yield benefit with the API?

        • Brian -- if the batching does prove much more efficient we should try it in other circumstances than reinferencing, such as the API

          • but start there

          • Jim -- so you have a path forward to improving the full re-inferencing, but not to the incremental addition of triples, which can still be large

        • Could the batching be done in the RDF API, or is it more upstream?

          • Brian -- not immediately obvious how that would be done in the RDF service, where it’s more complex

          • Jim -- not to difficult to handle with “add RDF”

          • but with the SPARQL API we’ve ceded a lot of the parsing to Jena, since it has to interpret the SPARQL

      • This “Simple Reasoner” we are talking about -- is it implemented here? edu.cornell.mannlib.vitro.webapp.reasoner.SimpleReasoner

      • when you reinference, does it start from scratch? (clear the kb-inf?)

    • Are there different types of reinferencing? No -- seems to be a single form that relies on SimpleReasoner

    • When should you do inferencing and/or reinferencing?

    • How do you start it? from the Site Admin menu

    • Where are the inferences stored? kb-inf graph?

    • Curious why mysql from the beginning rather than a dedicated triple store?

    • Are there any special considerations for how a third party triple store could handle reinferencing?

      • Jon: Probably yes, but we don’t have a lot of experience

      • Brian: I’ve used Sesame with materialized triples generated from reinferencing.

      • WIth OWLIM it does a similar kind of materializing of inferred statements For the most part, it works nicely.

      • One gotcha is that Vitro uses something called mostSpecificType but Sesame uses a different property for that concept -- so have to make that configurable

      • and other triple stores may not do the most specific type piece

      • you might need to have VIVO handle that bit of reasoning but have the triple store do 90% of the class subsumption reasoning

      • might get data repeated on the page inadvertently -- if in the past have thrown a more capable reasoner

        • super-properties inferred as well -- e.g., hasPart when you are using sub-properties like hasSubOrganization

        • can be addressed through more sophisticated application logic, as is done with the “faux” properties in VIVO as a first step in an application ontology

      • we need to experiment with the capabilities of a number of external triple stores to see what advantages or disadvantages they offer

      • and we will need to modify VIVO to have its inferencing much more configurable

    • what is a faux property?

      • they are properties never defined in your ontology or in your database, but a contextual configuration for a real property

      • e.g., the “bearer of” property to link a person role so that we can break that out in VIVO with a context-specific label such as “has investigator role” or “has leadership role” depending on the object involved

      • a little complex

      • Jim -- a faux property is not represented by a triple in the triple store? Yes, but it’s stored differently in the triple store than you might expect looking at the application

      • Jon -- there is a triple there, but under a very generic property

      • Jim -- the triple looks to be there for display purposes, and we don’t want to see both it and the more generic one

      • Stephan - ‘faux’ properties are just a poor implementation of the qualified relation pattern 

        • (Jon, later) -- qualified relations look to be very much like the vivo:Relationship class, which is not the same thing as a faux property, which is about contextual labeling of a direct object property between two entities

      • Brian - Faux properties have nothing to do with how the data is actually modeled or the patterns used in the ontology.  It’s just a way of configuring the VIVO application to apply a different label (or other settings such as rank position, editability, etc.) to a predicate in a certain context.

    • Does all of this apply to both Vitro and VIVO, or is there inferencing/reinferencing specific source code/logic for VIVO (separate from anything specified in the VIVO-ISF ontology itself)?

      • not seeing any direct references to SimpleReasoner in VIVO source code project, so inferencing/reinferencing seems to be fully handled by core Vitro code

      • Could we bring up GitHub and do a quick drive-by of some of the directories or classes involved here?

    • Patrick -- in our Ruby ingest, we map faux properties to a human-readable string

  • Bugs and fixes

    • Performance

    • Brian Lowe: "I spent some time yesterday investigating the benefit of doing the infererred triple inserts in larger batches rather than one-by-one, and at least on my machine this looks like it will offer a very significant improvement in speed.  I opened a few issues for myself for 1.8.  As part of the batching change, I’ll also modify it so it uses the RDFService directly for getting access to the triple store, instead of going through the additional legacy Jena model layer.  This should avoid waits for model locks."

  • What are some efficiencies to be had?


