Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BatchDuration (Modeshape4)Duration (Modeshape5)Improvement
10:19:190:13:52

28.2%

20:27:030:23:1913.8%
30:39:160:33:4114.2%
40:52:130:43:4316.3%
51:06:220:56:3614.7%
61:23:291:10:4615.2%
71:41:26 1:26:3014.7% 
82:02:22 1:43:0815.7% 
93:17:40 2:37:3120.3% 
103:47:48 3:10:1416.5%

Anchor
many_members
many_members
Retrieving Objects With Many Links to Repository Objects

Compared to objects with a large number of literal properties or URI properties, objects with a large number of links to repository objects are much slower.  E.g., an object with 10,000 properties where the objects are literals or non-repository URIs can be retrieved in 200 milliseconds, but an object with 10,000 properties where the objects are repository objects takes 7-36 seconds, depending on the settings, storage backend, etc.

There are also significant differences between LevelDB and PostgreSQL/MySQL backends, with LevelDB being much faster: 7-10 seconds as opposed to 30+ seconds for the object with 10,000 links to repository objects.

Version/BranchLevelDBMySQLPostgreSQL
4.5.08n/an/a
4.5.1104336
master (a58f5a05)73229
modeshape5 (c177adc8)n/a8930

See test scripts.

Testing initially focused on:

  • using properties explicitly set on the object, as compared to IndirectContainers
  • debugging the RDF-generation code that produces the IndirectContainer triples
  • running under Tomcat instead of Jetty

However, those do not appear to significantly impact performance.  So the process of looking up which node a proxy points to and converting the node reference to a URI seem to be the problem.  The process is:

  • List the children of a direct container and load each node.
  • Load the node the proxyFor property points to.
  • Convert the member node to a URI.

Each of these steps is reasonably fast (~1msec).  But as the number of members grows, even 3 msec per member eventually adds up.  For example, a collection with 10,000 members would take 30 seconds.

Some possible options for improving performance include:

  • Caching nodes: this can improve the time to look up the member node and convert it to a URI.
  • Using properties explicitly set on the collection object instead of proxies: this can eliminate the extra node lookup for loading the proxy node.
  • Using Modeshape's internal query functionality: in theory this could be more efficient than iterating over the proxies.  However, it appears that Modeshape uses the database as a document store, and so winds up loading all of the members anyway, with performance very similar to just iterating over all the children.