08 July, 2014

balancing tests between 4 workers by external observation

We have a bunch of functional tests that are run by our buildbot every time someone makes a git push.

These tests were taking a long time. Obvious answer: run them in parallel.

This brought up some problems, the biggest of which was how to split the tests between the four test runners. Naively chopping the test list into four pieces turned out to be pretty imbalanced: one test runner was taking 4 minutes, another was taking 14 minutes.

Initially this sounded like something a task farm would be good for. But I didn't want to get into digging round in the groovy test running code, and making the test runners talk back to the buildbot to collect tasks.

So I took a less balancing approach with a simpler interface: A balance program picks some tests for each of the four test runners, runs the tests, takes the time of the whole run for each runner, and then iteratively updates its knowledge so that it will hopefully pick a more balanced distribution next time round.

I had a quick play with genetic algorithms but that didn't seem to be going anywhere. Next I implemented this model:

Assume there is a startup cost k, and for each test i, a time t_i that the test takes to run. These cannot directly be measured by the balance program.

Keep an estimate of k and t_i in a state file.

Make a distribution of tests over the four runners based on the estimates.

When each runner finishes, if it took longer than the estimate, nudge up the scores on k and the tests that where on that runner; similarly nudge down if the run time was less.

Run this lots of times.

After a while this rearranges the tests so that each runner takes about 10 minutes each (compared to the 4 .. 14 minutes with a naive distribution)

So we've saved a few minutes on the tests and are hopefully in a position where as we get more tests we can scale up the number of runners and still preserve reasonable balance.

This also copes with converging to a new balance when tests are added/removed; or when test time changes (either due to the test itself changing, or the behaviour being tested)

(The other problem was that it turns out that loads of our tests were secretly dependent on each other and failed when run in different order - this would occasionally cause problems in the naive distribution but was much more of a problem with the reordering that happens with this balancing approach)

Source code is https://github.com/benclifford/weightbal

No comments:

Post a Comment