Friday, November 21, 2008

how to remove spikes from rrd graphs (cacti, mrtg, zenoss, ganglia)

If you have been using open source tools for performance monitoring of your IT infrastructure, combine harvester or space station you have most likely come across rrdtool. This swiss army knife is used to create pretty graphs for most monitoring tools. Sometimes graphs are getting out of whack because some unplanned event causes a spike that pushes the graph out of proportion. In our case this was a crash of one of our storage head units that rendered the annual IOPS graph almost unreadable.

Fortunately there is a little tool called "removespikes" on the rrdtool website that can take care of this little problem:

http://oss.oetiker.ch/rrdtool/pub/contrib/

download and unpack the latest removespikes-xxxxxxx-mkn.tar.gz from the website, copy the script removespikes.pl into your rrdtool graph directory and execute it like this:

./removespikes.pl -d -l 0.1 netappa0_ops.rrd

The -l parameter defines how aggressively spikes are chopped off. Start with 0.1. The default is 0.6.

In my case it seemed to do the right thing and chopped of two peaks at 2008-01-25 and 2008-03-03

Using the "rrdtool tune" command is another option for repairing graphs. It just sets a max value for x or y axis. I think remove spikes is better but I have not seen the results of "rrdtool tune" yet.

dipe