15 January, 2012

server availability like uptime

I wondered if I could get a measure of server availability as a single number, automatically (for calculating things like how tragically few nines of uptime my own servers have)

So, I wrote a tool called long-uptime which you use like this:

The first time you run the code, initialise the counter. You can specify your estimate, or let it default to 0:

$ long-uptime --init
and then every minute in a cronjob run this:
$ long-uptime
0.8974271427587808
which means that the site has 89.7% uptime.

It computes an exponentially weighted average with a decay constant (which is a bit like a half life) of a month. This is how unix load averages (the last three values that come out of the uptime command) are calculated, though with much shorter decay constants of 1, 5, and 15 minutes.

When the machine is up (that is, you are running long-uptime in a cron job), then the average moves towards 1. When the machine is down (that is, you are not running long-uptime), then the average moves towards 0. Or rather, the first time you run long-uptime after a break, it realises you haven't run it during the downtime and recomputes the average as if it had been accumulating 0 scores.

Download the code:

$ wget http://www.hawaga.org.uk/tmp/long-uptime-0.1.tar.gz
$ tar xzvf long-uptime-0.1.tar.gz
$ cabal install
$ long-uptime --init

No comments:

Post a Comment