Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When comparing D and E (with and without indexing), there should be an increase in performance, when turning indexing off. Since this is not the case I'm guessing that the I/O bottleneck is hit even earlier (replication over the network?) so that indexing does not slow down the ingest process at all.

Node network I/O performance

The physical hosts have a 1gb/s network connection but I measured the network performance to be ~ 10MB/s when pushing one file from one VM to another VM over the network. This is probably due to the fact that multiple VMs share the I/O channel of one physical host

Node hdd performance

ubuntu@ ubuntu:/data$  sync;time sudo bash -c "(dd if=/dev/zero of=bf bs=8k count=500000; sync)"

...

real 2m34.033s
user 0m0.060s
sys 0m5.590s

Load balancing

Load balancing is done by using an apache server with mod_jk enabled and a worker.properties file which has the individual nodes configured as mod_jk workers. This results in a simple round-robin load balancing mechanism.

The Workers.properties file is currently being generated via a shell script: 

https://github.com/futures/scc-cluster-install/blob/master/fedora-node.sh#L44

Example:

To balance between 7 nodes the workers.properties file could look like this: 

https://gist.github.com/fasseg/7138008

 

 

Results

Test Utility

BenchTool: https://github.com/futures/benchtool

...