Performance tests using benchtool were performed using two machines:
- Desktop: 2010 MacPro (dual 2.26GHz Quad-core Xeon, 6GB RAM, 3GB/s SATA disk)
- Laptop: 2013 Macbook Air (1.3GHz Core i5, 8GB RAM, 10GB/s SSD disk)
- Connected via 802.11n wifi
Depending on which machine was used as the client and which was used as the server, I get dramatically different results:
Client | Server | F3 time (ms) | F3 throughput (mb/s) | F4 single time (ms) | F4 single | F4 minimal time (ms) | F4 minimal thoughput (mb/s) | Comparison |
---|---|---|---|---|---|---|---|---|
desktop | laptop | 1,103,646 | 4.64 | 1,218,615 | 4.20 | 554,228 | 9.24 | F4 minimal 2x faster |
laptop | desktop | 1,245,344 | 4.11 | 2,963,019 | 1.73 | 635,166 | 8.06 | F4 minimal 2x faster |
laptop | laptop | 64,250 | 78.69 | 693,911 | 7.37 | 63,870 | 80.16 | approx. equal |
desktop | desktop | 215,586 | 23.75 | 2,510,928 | 2.04 | 102,774 | 49.82 | F4 minimal 2x faster |
The hardware used for these tests is not the typical server hardware I would normally expect to be used for running repository software. But the differences suggest that some aspect of my setup (network latency, slower disk, etc.) impacts F4 much more dramatically than F3.
Ingesting 25,000 Objects (100KB datastreams)
On the laptop detailed above, I ingested 25,000 objects using benchtool, with one 100KB datastream per object. Below are the times (in milliseconds) to process each batch of 1,000 objects). Fedora 3 processed successive batches at constant speed, but each Fedora 4 batch took longer than the previous batch. Switching to leveldb storage produced the same trend, but at a slightly slower starting point.
batch | Fedora 3 | Fedora 4 (minimal) | Fedora 4 (minimal/leveldb) |
---|---|---|---|
1 | 49847 | 33795 | 40453 |
2 | 36740 | 36815 | 49240 |
3 | 34369 | 39375 | 62152 |
4 | 33280 | 47404 | 71717 |
5 | 32756 | 53092 | 81081 |
6 | 32433 | 61366 | 89874 |
7 | 32441 | 66524 | 97485 |
8 | 31839 | 73329 | 106713 |
9 | 32016 | 79913 | 115740 |
10 | 32038 | 84424 | 123734 |
11 | 31628 | 92909 | 133757 |
12 | 31356 | 96724 | 138626 |
13 | 31478 | 105221 | 150474 |
14 | 31166 | 114825 | 163861 |
15 | 30993 | 119416 | 170712 |
16 | 30922 | 127985 | 180907 |
17 | 30700 | 130925 | 189496 |
18 | 30920 | 140059 | 201232 |
19 | 30706 | 143905 | 205339 |
20 | 30820 | 153584 | 219809 |
21 | 30635 | 156370 | 229460 |
22 | 30943 | 166349 | 238547 |
23 | 30814 | 174377 | 249617 |
24 | 31340 | 183079 | 257109 |
25 | 31603 | 208978 | 281443 |
Ingesting 10,000 Objects (50MB datastreams)
On the laptop detailed above (Platform: melendor.local, Repository: Minimal), I ingested 10,000 objects using benchtool with one 50MB datastream per object, for a total of about 500GB of data (Procedure: 10,000 x 50MB). Below are the times (in milliseconds) to process each batch of 100 objects). Both Fedora 3 and Fedora 4 processed successive batches at roughly constant speed, but Fedora 4 was approximately 1.8x faster (averaging 2:47 per batch compared to Fedora 3's 4:59 per batch).
batch | Fedora 3 | Fedora 4 (minimal) |
---|---|---|
1 | 290788 | 155127 |
2 | 293678 | 154484 |
3 | 295111 | 156252 |
4 | 295962 | 157926 |
5 | 302069 | 159000 |
6 | 298721 | 160365 |
7 | 300685 | 160853 |
8 | 300740 | 165863 |
9 | 299747 | 163468 |
10 | 301556 | 164443 |
11 | 303861 | 163171 |
12 | 300091 | 164029 |
13 | 298787 | 162513 |
14 | 298724 | 162098 |
15 | 301085 | 161380 |
16 | 296167 | 161457 |
17 | 294318 | 161638 |
18 | 290227 | 162317 |
19 | 288966 | 160999 |
20 | 308677 | 161790 |
21 | 303017 | 162929 |
22 | 297760 | 163207 |
23 | 294605 | 162424 |
24 | 295788 | 162091 |
25 | 295981 | 163107 |
26 | 296948 | 165043 |
27 | 297577 | 162752 |
28 | 295650 | 163267 |
29 | 290712 | 165607 |
30 | 295921 | 164738 |
31 | 297859 | 163131 |
32 | 300048 | 163308 |
33 | 292107 | 163824 |
34 | 299284 | 165421 |
35 | 300503 | 164128 |
36 | 293813 | 164559 |
37 | 298295 | 165218 |
38 | 295376 | 164738 |
39 | 296687 | 164848 |
40 | 300273 | 165371 |
41 | 294990 | 165495 |
42 | 300647 | 166607 |
43 | 298399 | 165596 |
44 | 301814 | 166976 |
45 | 298124 | 167070 |
46 | 296355 | 165732 |
47 | 299041 | 166921 |
48 | 298724 | 165260 |
49 | 294858 | 165933 |
50 | 299575 | 176832 |
51 | 300505 | 166455 |
52 | 295658 | 166773 |
53 | 301021 | 166932 |
54 | 296317 | 168370 |
55 | 295020 | 166833 |
56 | 303502 | 167670 |
57 | 300035 | 168204 |
58 | 304416 | 167630 |
59 | 299448 | 167861 |
60 | 296159 | 167476 |
61 | 302893 | 169372 |
62 | 298650 | 169443 |
63 | 296943 | 168846 |
64 | 295261 | 169570 |
65 | 302757 | 169021 |
66 | 300400 | 169550 |
67 | 298693 | 167803 |
68 | 300473 | 168840 |
69 | 296038 | 169339 |
70 | 297755 | 169388 |
71 | 302784 | 176599 |
72 | 298056 | 168773 |
73 | 299224 | 171291 |
74 | 297199 | 169741 |
75 | 301671 | 168778 |
76 | 299044 | 169379 |
77 | 302522 | 173463 |
78 | 295791 | 169912 |
79 | 296927 | 169597 |
80 | 306097 | 170421 |
81 | 302493 | 171432 |
82 | 301077 | 169981 |
83 | 303602 | 170674 |
84 | 303631 | 171172 |
85 | 298102 | 172985 |
86 | 301552 | 172529 |
87 | 304410 | 174062 |
88 | 305995 | 171477 |
89 | 302005 | 171312 |
90 | 299469 | 172294 |
91 | 301209 | 174919 |
92 | 306794 | 179721 |
93 | 298726 | 175800 |
94 | 297843 | 172196 |
95 | 303512 | 174223 |
96 | 300477 | 172386 |
97 | 302825 | 173491 |
98 | 302478 | 176587 |
99 | 296834 | 174085 |
100 | 299554 | 173593 |
5 Comments
Andrew Woods
Is the implication that F4 bottlenecks are masked by network-latency in cases #1 and #2?
It would be interesting to put YourKit on either of tests #3 or #4 in the F4 case to see where the majority of time is taken.
Unknown User (escowles@ucsd.edu)
Yes, I think the effect of network latency is overwhelming the base performance differences. F3 is ~20x faster with both the client and server on the laptop compared to the client on the desktop and the server on the laptop. And it's ~6x faster with both client and server on the desktop. But F4 doesn't show the same level of improvements (it's only about 2x faster with both client and server on the laptop, and only marginally faster with both on the desktop).
These tests were run with a stock fcrepo4 with no System properties set to alter the Modeshape/Infinispan defaults. I'm rerunning the tests now using the minimal config and the times are much more competitive with F3 – I'll update the table shortly.
Andrew Woods
Using "minimal" config is much more encouraging. Thanks for making that picture clear (at least for your hardware).
We should get Osman Din and frank asseg to try as well.
Andrew Woods
Unknown User (escowles@ucsd.edu), in looking more closely at the "minimal" config, it is using the "storage" type of "file". In terms of performance, I believe we see better numbers with a "storage" type of "cache" using LevelDB (as defined in "single"). Could you run the "minimal" test again with the "storage" element defined as in "single"... which would effectively be using "single" without the "query" element.
Andrew Woods
These are very interesting results, Unknown User (escowles@ucsd.edu). Would it be possible to create a small table at the beginning of the page that summarizes and highlights the findings?