Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleThe quirk of "local"

Notably, the above query uses a "local" prefix. In VIVO, most CONSTRUCT statements are written to generate models that match the underlying source ontologies, such that the SELECT queries executed on those models could also just as easily run against the source triple store directly.

If you know that you are always going to be executing against a CONSTRUCTed temporary model - and when the CONSTRUCT is usually the best case, you might as well commit to doing it - then there is no need to replicate the original model in the reduced set, when you can easily collapse certain graphs into more descriptive "invented" predicates, reducing the model size, reducing the complexity and increasing the specificity of the following SELECT.

 

Going Deeper

Most of the optimisation work that went into the VIVO 1.8.1 release depended on using the YourKit profiler (http://yourkit.com/), pointing it at a running server and getting statistics for where code is taking a long time to execute.

One of the nice features of YourKit is that it can profile SQL statements as well as the Java functions - it doesn't go to the level of telling you what is happening inside the SQL engine itself (that would require an SQL profiler in the database server), but it does tell you what queries are being executed, and how long they take.

Note
titleYourKit SQL Probes

 In order to profile SQL statements, YourKit uses a technology they call "probes". By default, when you integrate YourKit with an enterprise server (e.g. Tomcat), the startup line that loads the agent will have the profiles disabled: "probe_disable=*". By removing this part of the line, it will use the default probe settings, which includes profiling SQL statements when the CPU profiling is enabled.

Initially, it was the profiling of SQL statements that alerted us to potential problems being caused by RDFService being hidden behind a Dataset proxy - when what was a single SPARQL query in the code seemed to cause hundreds of SQL queries to be executed in the back end.

That explosion of queries was actually caused by the execution of SPARQL against the proxy dataset being translated into find()s that would then be executed as (individually, very efficient) SELECT queries on the RDFService.

By taking the proxy out of the equation, and getting the SPARQL executed directly against the SDB backend, the above CONSTRUCT was now being executed as a single - albeit large! - SQL statement:

Code Block
languagesql
linenumberstrue
 SELECT                                   -- V_1=?authorLabel V_2=?coAuthorshipNode V_3=?document V_4=?authorshipNode V_5=?coAuthorPerson V_6=?coAuthorPersonLabel
  R_1.lex AS V_1_lex, R_1.datatype AS V_1_datatype, R_1.lang AS V_1_lang, R_1.type AS V_1_type, 
  R_2.lex AS V_2_lex, R_2.datatype AS V_2_datatype, R_2.lang AS V_2_lang, R_2.type AS V_2_type, 
  R_3.lex AS V_3_lex, R_3.datatype AS V_3_datatype, R_3.lang AS V_3_lang, R_3.type AS V_3_type, 
  R_4.lex AS V_4_lex, R_4.datatype AS V_4_datatype, R_4.lang AS V_4_lang, R_4.type AS V_4_type, 
  R_5.lex AS V_5_lex, R_5.datatype AS V_5_datatype, R_5.lang AS V_5_lang, R_5.type AS V_5_type, 
  R_6.lex AS V_6_lex, R_6.datatype AS V_6_datatype, R_6.lang AS V_6_lang, R_6.type AS V_6_type
FROM
    ( SELECT DISTINCT                    -- ?coAuthorPerson:(Q_9.o=>S_1.X_1) ?document:(Q_5.o=>S_1.X_2) ?coAuthorshipNode:(Q_7.o=>S_1.X_3) ?authorLabel:(Q_2.o=>S_1.X_4) ?coAuthorPersonLabel:(Q_11.o=>S_1.X_5) ?authorshipNode:(Q_3.o=>S_1.X_6)
        Q_9.o AS X_1, 
        Q_5.o AS X_2, 
        Q_7.o AS X_3, 
        Q_2.o AS X_4, 
        Q_11.o AS X_5, 
        Q_3.o AS X_6
      FROM
          Quads AS Q_1                   -- <urn:x-arq:DefaultGraphNode> <http://localhost/individual/author> rdfsyn:type foaf:Person
        INNER JOIN
          Quads AS Q_2                   -- <urn:x-arq:DefaultGraphNode> <http://localhost/individual/author> rdfs:label ?authorLabel
        ON ( Q_1.s = -4883577620769971978 -- Const: <http://localhost/individual/author>
         AND Q_1.p = -6430697865200335348 -- Const: rdfsyn:type
         AND Q_1.o = -1118181488561280847 -- Const: foaf:Person
         AND Q_2.s = -4883577620769971978 -- Const: <http://localhost/individual/author>
         AND Q_2.p = 6454844767405606854 -- Const: rdfs:label
         )
        INNER JOIN
          Quads AS Q_3                   -- <urn:x-arq:DefaultGraphNode> <http://localhost/individual/author> core:relatedBy ?authorshipNode
        ON ( Q_3.s = -4883577620769971978 -- Const: <http://localhost/individual/author>
         AND Q_3.p = 7813032771907687750 -- Const: core:relatedBy
         )
        INNER JOIN
          Quads AS Q_4                   -- <urn:x-arq:DefaultGraphNode> ?authorshipNode rdfsyn:type core:Authorship
        ON ( Q_4.p = -6430697865200335348 -- Const: rdfsyn:type
         AND Q_4.o = -7985466041922445122 -- Const: core:Authorship
         AND Q_3.o = Q_4.s               -- Join var: ?authorshipNode
         )
        INNER JOIN
          Quads AS Q_5                   -- <urn:x-arq:DefaultGraphNode> ?authorshipNode core:relates ?document
        ON ( Q_5.p = -3633326295402292183 -- Const: core:relates
         AND Q_3.o = Q_5.s               -- Join var: ?authorshipNode
         )
        INNER JOIN
          Quads AS Q_6                   -- <urn:x-arq:DefaultGraphNode> ?document rdfsyn:type <http://purl.obolibrary.org/obo/IAO_0000030>
        ON ( Q_6.p = -6430697865200335348 -- Const: rdfsyn:type
         AND Q_6.o = 1885280957395725387 -- Const: <http://purl.obolibrary.org/obo/IAO_0000030>
         AND Q_5.o = Q_6.s               -- Join var: ?document
         )
        INNER JOIN
          Quads AS Q_7                   -- <urn:x-arq:DefaultGraphNode> ?document core:relatedBy ?coAuthorshipNode
        ON ( Q_7.p = 7813032771907687750 -- Const: core:relatedBy
         AND Q_5.o = Q_7.s               -- Join var: ?document
         )
        INNER JOIN
          Quads AS Q_8                   -- <urn:x-arq:DefaultGraphNode> ?coAuthorshipNode rdfsyn:type core:Authorship
        ON ( Q_8.p = -6430697865200335348 -- Const: rdfsyn:type
         AND Q_8.o = -7985466041922445122 -- Const: core:Authorship
         AND Q_7.o = Q_8.s               -- Join var: ?coAuthorshipNode
         )
        INNER JOIN
          Quads AS Q_9                   -- <urn:x-arq:DefaultGraphNode> ?coAuthorshipNode core:relates ?coAuthorPerson
        ON ( Q_9.p = -3633326295402292183 -- Const: core:relates
         AND Q_7.o = Q_9.s               -- Join var: ?coAuthorshipNode
         )
        INNER JOIN
          Quads AS Q_10                  -- <urn:x-arq:DefaultGraphNode> ?coAuthorPerson rdfsyn:type foaf:Person
        ON ( Q_10.p = -6430697865200335348 -- Const: rdfsyn:type
         AND Q_10.o = -1118181488561280847 -- Const: foaf:Person
         AND Q_9.o = Q_10.s              -- Join var: ?coAuthorPerson
         )
        INNER JOIN
          Quads AS Q_11                  -- <urn:x-arq:DefaultGraphNode> ?coAuthorPerson rdfs:label ?coAuthorPersonLabel
        ON ( Q_11.p = 6454844767405606854 -- Const: rdfs:label
         AND Q_9.o = Q_11.s              -- Join var: ?coAuthorPerson
         )
    ) AS S_1                             -- ?coAuthorPerson:(Q_9.o=>S_1.X_1) ?document:(Q_5.o=>S_1.X_2) ?coAuthorshipNode:(Q_7.o=>S_1.X_3) ?authorLabel:(Q_2.o=>S_1.X_4) ?coAuthorPersonLabel:(Q_11.o=>S_1.X_5) ?authorshipNode:(Q_3.o=>S_1.X_6)
  LEFT OUTER JOIN
    Nodes AS R_1                         -- Var: ?authorLabel
  ON ( S_1.X_4 = R_1.hash )
  LEFT OUTER JOIN
    Nodes AS R_2                         -- Var: ?coAuthorshipNode
  ON ( S_1.X_3 = R_2.hash )
  LEFT OUTER JOIN
    Nodes AS R_3                         -- Var: ?document
  ON ( S_1.X_2 = R_3.hash )
  LEFT OUTER JOIN
    Nodes AS R_4                         -- Var: ?authorshipNode
  ON ( S_1.X_6 = R_4.hash )
  LEFT OUTER JOIN
    Nodes AS R_5                         -- Var: ?coAuthorPerson
  ON ( S_1.X_1 = R_5.hash )
  LEFT OUTER JOIN
    Nodes AS R_6                         -- Var: ?coAuthorPersonLabel
  ON ( S_1.X_5 = R_6.hash )

Note: The hashes in the query above are not necessarily the correct hashes for the actual predicates / objects, but the overall statement is correct.

the