Below are the results of performance testing comparing performance of Fedora-based applications with real-world data.

Plum Ingest

Ingesting a large book with 1000 100MB TIFF images, repeated with Fedora 4.5.1 release (based on Modeshape 4), and the experimental Modeshape 5 branch (in both cases, Fedora was configured to use the PostgreSQL database object store).  Durations are reported as HH:MM:SS, for batches of 100 images loaded using Princeton's Hydra head, Plum.

BatchDuration (Modeshape4)Duration (Modeshape5)Improvement
10:19:190:13:52

28.2%

20:27:030:23:1913.8%
30:39:160:33:4114.2%
40:52:130:43:4316.3%
51:06:220:56:3614.7%
61:23:291:10:4615.2%
71:41:261:26:3014.7%
82:02:221:43:0815.7%
93:17:402:37:3120.3%
103:47:483:10:1416.5%

Retrieving Objects With Many Links to Repository Objects

Compared to objects with a large number of literal properties or URI properties, objects with a large number of links to repository objects are much slower.  E.g., an object with 10,000 properties where the objects are literals or non-repository URIs can be retrieved in 200 milliseconds, but an object with 10,000 properties where the objects are repository objects takes 7-36 seconds, depending on the settings, storage backend, etc.

There are also significant differences between LevelDB and PostgreSQL/MySQL backends, with LevelDB being much faster: 7-10 seconds as opposed to 30+ seconds for the object with 10,000 links to repository objects.

Version/BranchLevelDBMySQLPostgreSQL
4.5.08n/an/a
4.5.1104336
master (a58f5a05)73229
modeshape5 (c177adc8)n/a8930

See test scripts.

Testing initially focused on:

However, those do not appear to significantly impact performance.  So the process of looking up which node a proxy points to and converting the node reference to a URI seem to be the problem.  The process is:

Each of these steps is reasonably fast (~1msec).  But as the number of members grows, even 3 msec per member eventually adds up.  For example, a collection with 10,000 members would take 30 seconds.

Some possible options for improving performance include: