21 May, 2011

ruby cloudwatch -> mrtg interface

I use mrtg to gather historical data on some of my servers. One of those servers lives in Amazon's Elastic Compute Cloud (EC2) and so is also monitored by Amazon CloudWatch.

Can I get cloudwatch data into mrtg?

mrtg has a fairly straightforward interface for plugging in arbitrary unix executables to collect data, so my first attempt was to use the main Java-based cloudwatch client to get data. that attempt started up one jvm for each metric collected, which massively overloaded my ec2 microinstance, keeping the load average around 3. pretty lame.

Amazon also provides a ruby interface. I had never programmed in ruby before, but its often interesting to learn a new language.

Here's what I ended up with.

First the config block for mrtg, which calls out to the ruby-mrtg-cloudwatch3 program that I wrote:

Target[cloudwatch_network]: `/home/mrtg/ruby-mrtg-cloudwatch/ruby-mrtg-cloudwatch3 NetworkIn NetworkOut AWS
/EC2 InstanceId=i-26bcaf51`
Title[cloudwatch_network]: Network traffic according to cloudwatch
options[cloudwatch_network]: growright,absolute,logscaleMaxBytes[cloudwatch_network]: 100000000

This gives a graph of network traffic according to cloudwatch. I can compare that alongside the network traffic graph for eth0 gathered from the local interface statistics. They should roughly match up, and they do (well hopefully they still do by the time you read this - these are live images):

According to the on-host network interface:


According to cloudwatch:


Now the actual ruby code:

#!/usr/bin/ruby1.8

require 'rubygems'
require 'AWS'

The two cloudwatch metric names, one that measures output data, one that measures input data, are give on the command line:
metrico=ARGV[0]
metrici=ARGV[1]

My code has hardcoded access keys at the moment which is a bit shitty:
ACCESS_KEY_ID='foo'
SECRET_ACCESS_KEY='bar'


Using the above credentials, a new cloudwatch object is made, @cw.

@cw = AWS::Cloudwatch::Base.new(:access_key_id => ACCESS_KEY_ID, :secret_access_key => SECRET_ACCESS_KEY, :server => "eu-west-1.monitoring.amazonaws.com" )

Each of the two metrics will be probed with the probe function. This uses a state file based on the metric name to get only readings which have not already been seen by this script. The two metrics use separate state files because cloudwatch doesn't give an atomic read for multiple metrics at once. The state file stores the time of the last seen reading. If there is no state file, we have to invent a time. There is a subtlety here: data does not appear in cloudwatch until around 5 minutes after its time stamp, so using the current time as an initial value results in not seeing any results. Instead, I go back about 15 minutes the first time, which will seems to be far enough back to get something.

def probe(metric)

  et = Time.now()

  statusfn="cloudwatch-"+ARGV[3]+"-"+metric+".status"
  if FileTest.exist?(statusfn) then

    f = File.new(statusfn, "r")
    tstring = f.gets
    ts = Time.parse(tstring)
    f.close
  else
    ts = et - 900 # needs to be more than 5 mins because otherwise we never get any data.
end

  res = @cw.get_metric_statistics(:measure_name => metric,  :statistics => 'Average,Sum', :namespace => ARGV[2], :period => 300, :start_time => ts, :end_time => et, :dimensions=> ARGV[3])


Now we're going to look at the rows that come back. Usually only one row will come back, if we're running this at about the same rate that cloudwatch is adding readings, but sometheres there will be more, or fewer.

In the case of network traffic, I want to return the sum of all readings for this metric. In other cases, such as disk usage, I would want to return the mean. This distinction is the same as default vs gauge measurements in MRTG.

samples = 0
  sum = 0
  avgsum = 0

  datapoints = res["GetMetricStatisticsResult"]["Datapoints"]

  lt = ts
  if datapoints.nil? then
   # nop
  else
    rows = datapoints["member"]

    rows.each { |r|
      nlt = Time.parse(r["Timestamp"])
      if(nlt < ts) then
        # nop - time was before requested start
      else
        samples += Float(r["Samples"])
        avgsum += Float(r["Average"])
        sum += Float(r["Sum"])
        nlt += 1
        if(nlt > lt) then
          lt = nlt
        end
      end
    } 

Now we can write out the new state file:

f=File.new(statusfn, "w")
    f.puts(lt)
    f.close
  end
  return sum
end


and finally output the MRTG format information:

sumo=probe(metrico)
sumi=probe(metrici)

# output mrtg format
puts sumo
puts sumi
puts 0
puts "cloudwatch: "+metrico+" and "+metrici


The end.

No comments:

Post a Comment