Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

BatchDuration (Modeshape4)Duration (Modeshape5)Improvement
10:19:190:13:52

28.2%

20:27:030:23:1913.8%
30:39:160:33:4114.2%
40:52:130:43:4316.3%
51:06:220:56:3614.7%
61:23:291:10:4615.2%
71:41:261:26:3014.7%
82:02:221:43:0815.7%
93:17:402:37:3120.3%
103:47:483:10:1416.5%

Anchor
many_members
many_members
Retrieving Objects With Many Links to Repository Objects

Compared to objects with a large number of literal properties or URI properties, objects with a large number of links to repository objects are much slower.  E.g., an object with 10,000 properties where the objects are literals or non-repository URIs can be retrieved in 200 milliseconds, but an object with 10,000 properties where the objects are repository objects takes 7-36 seconds, depending on the settings, storage backend, etc.

...

Version/BranchLevelDBMySQLPostgreSQL
4.5.08 n/an/a 
4.5.110TODO43TODO36
master (a58f5a05)73229
modeshape5 (c177adc8)n/a8930acoburn:fcrepo-1957 (4bf3ecab)93630

See test scripts.

Testing initially focused on:

...

However, those do not appear to significantly impact performance.  So the process of converting a looking up which node a proxy points to and converting the node reference to a URI seems like the most likely culprit (hence looking at acoburn:fcrepo-1957).seem to be the problem.  The process is:

  • List the children of a direct container and load each node.
  • Load the node the proxyFor property points to.
  • Convert the member node to a URI.

Each of these steps is reasonably fast (~1msec).  But as the number of members grows, even 3 msec per member eventually adds up.  For example, a collection with 10,000 members would take 30 seconds.

Some possible options for improving performance include:

  • Caching nodes: this can improve the time to look up the member node and convert it to a URI.
  • Using properties explicitly set on the collection object instead of proxies: this can eliminate the extra node lookup for loading the proxy node.
  • Using Modeshape's internal query functionality: in theory this could be more efficient than iterating over the proxies.  However, it appears that Modeshape uses the database as a document store, and so winds up loading all of the members anyway, with performance very similar to just iterating over all the children.