.


web tutorials


 

A script for detecting and reporting high iowait times - and runaway user processes (usr processes)

I discovered my site getting increasingly sluggish. One thing led to another and the search problem was somewhat fixed; but still, sometimes the site slowed down. Using the top command-line utility (Linux) told me that the problem was high iowait times; which means that disk access is being demanded by too many things at once, generally a problem caused by lack of memory and/or bad programming. I thought it would be great to be able to find out when the site is sluggish because of this problem - it helps both in troubleshooting and in preventing loss of availability, since it really can slow down the server to a crawl - and asked for a script. Eventually a good programmer named Jonathan Felder wrote one for me, and having paid the bill, I am sharing it with you. You may well have to change some of these variables; and you will need to install the readily-available iostats program (which you should have anyway!). Iostats, though commonly used, is not usually in the Linux default install, but there are lots of rpms and deb-files for it.

There are some updates and notes after the main script, including a modification to handle high user loads, added in November 2007.

I hope you find it useful. Just copy the text, paste it into a text file named iowait.pl, update the variables, and run it from the command line as ./iowait.pl --test; it should send you an e-mail. If it does not, fix the variables; if it does, run it as ./iowait.pl and it will sit in the background for you. I'll try to post a "low idle" script here soon.

#!/usr/bin/perl
use Mail::Sender;
use POSIX 'setsid';
use strict;
no warnings 'uninitialized' ;
our($EMAILADDRESS, $IOWAITTHRESHOLD, $TIMETHRESHOLD, $EMAILTIMER, $SUBJECT, $SMTPSERVER, $FROM, $IOSTAT);
$EMAILADDRESS = "yourmail\@yourdomain.com"; # Email address you want the alerts to go to
# The backslash before the @ is necessary
$IOWAITTHRESHOLD = 20; # The percentage required to trigger alert
$TIMETHRESHOLD = 60*2; # Length of time in seconds for a high iowait to trigger alert
$EMAILTIMER = 60*30; # Length of time in seconds between alert emails
$SUBJECT = "Iowait alert!"; # Subject of alert message
$SMTPSERVER = "localhost";
#$SMTPSERVER = "mail.yourserver.com"; # SMTP server used to send alerts
$IOSTAT = "/usr/bin/iostat"; # Location of iostat program on your server
$FROM = "you\@yourserver.com"; # Who the alert email is from, the backslash is necessary

main();
exit 0;

sub main
{
# if test argument is given, print run iostat and email result
if ($ARGV[0] eq "--test")
{
my($body);

open(README, "$IOSTAT -c |") or die "Can't run iostat: $!\n";
while (<README>)
{
$body .= $_;
}
close(README);

sendemail($body);
}
else
{
my ($iowaittimer, $etimer, $output, @outputarray, $body);

$iowaittimer = 0;
$etimer = 0;

# daemonize process
$SIG{CHLD} = 'IGNORE';
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
setsid;
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
$SIG{CHLD} = 'DEFAULT';

# run iostat -c 1, specifies output every second
open(README, "$IOSTAT -c 1 |") or die "Can't run iostat: $!\n";

while (<README>)
{
$output = $_;

# we're looking for the output that has digits
if ($output =~ /^\s+\d/)
{
# get rid of the pesky spaces
chomp($output);
$output =~ s/^\s+//g;
$output =~ s/\s+/ /g;

# grab the values
@outputarray = split(" ", $output);

# the fourth item in the list is iowait (arrays begin at 0)
# if the iowait% is higher than our threshold increment our timer, otherwise clear it
if ($outputarray[3] >= $IOWAITTHRESHOLD)
{
$iowaittimer++;
}
else
{
$iowaittimer = 0;
}

# always decrement the email timer
$etimer--;

# sends email if our iowait has been high for too long, and we haven't sent an email recently
if ($iowaittimer >= $TIMETHRESHOLD && $etimer <= 0)
{
$body = "Iowait threshold exceeded!\n\nIowait percentage is currently: $outputarray[3]!\n\n";
sendemail($body);
$etimer = $EMAILTIMER;
}
}
}

close(README);
}
}

sub sendemail
{
my($sender, $body);
$body = $_[0];
$sender = new Mail::Sender {smtp => "$SMTPSERVER", from => "$FROM"};
$sender->Open({to => $EMAILADDRESS, subject => $SUBJECT});
$sender->SendLineEnc("$body\n");
$sender->Close();
}

If that doesn't work for you, or if you want something else, try this:

rssohan; Created with sysstat version 5.0.6 and bash version 2.05b.0(1)-release. Don't forget to set the variables as needed.

#!/bin/bash

IOSTAT_CMD="/usr/bin/iostat"

#this is the colum number for the iowait value
IOSTAT_VAL_NO=13

#the frequency at which to wake up and check the iowait value in secs
CHECK_FREQ=10

#the threshold which is considered dangerous (in whole numbers)
IOWAIT_THRESHOLD=40

#the period above which the iowait value must contin.stay for email to be sent
#in seconds
MAX_OVERLOAD_PER=5

#the person(s) to send mail to
MAIL_ADDR="rss"

function send_warning_mail()
{
HOSTNAME="`hostname -f`"
DATE_STRING="`date '+%d/%y/%m [%H:%M]'`"
mail -s "${HOSTNAME} [${DATE_STRING} ] -- iowait load avg exceeded" <<EOF \
${MAIL_ADDR}
Automated Message from ${HOSTNAME} on ${DATE_STRING}
The IOWAIT theshold exceeded $1 (last value $2) for $3 seconds.
Please fix, k, thx.

EOF
}

function get_iowait_val()
{

TMP=`$IOSTAT_CMD`

i=0
for val in $TMP
do
[ $i == $IOSTAT_VAL_NO ] && break
i=$((i+1))
done

##empty val value
[ $val == "" ] && {
echo "error getting iostat val, check the column number is corr"
exit 1
}

##check we're a number
###start by checking we have a "."
period_index=`expr index $val '.'`
[ $period_index -eq 0 ] && {
echo "error parsing period from iostat val, check column is corr"
exit 1
}

##get up until the period
val=`expr substr $val 1 $((period_index-1))`
iowait_val=$((val))

}

ovrld_st=0

while :

do
get_iowait_val

if [ $iowait_val -gt $IOWAIT_THRESHOLD ] ; then
date=`date +'%s'`

#update the over_threshold_period value
#if it's 0, set it to the current date (start mark) else
#just calc the delta
if [ $ovrld_st -eq 0 ]; then

ovrld_st=$((date))
else

[ $((date-ovrld_st)) -gt $MAX_OVERLOAD_PER ] && {
send_warning_mail $IOWAIT_THRESHOLD $iowait_val $MAX_OVERLOAD_PER
ovrld_st=0
}
fi

else
ovrld_st=0
fi

sleep $CHECK_FREQ
done

Getting notification of high user loads

For me, the real fix was changing from a server with normal-speed PATA drives to one with a faster SCSI drive. Since doing that, iowait has not been a problem, but every now and then, MySQL goes into a spin, and then I need to measure the amount of CPU time taken up by usr processes. Ideally at some point I'll just set something up to measure idle time but if you want to switch the first script to measure usr %, just change

if ($outputarray[3] >= $IOWAITTHRESHOLD)

to

if ($outputarray[0] >= $IOWAITTHRESHOLD)

Then go down to the e-mail message itself, and make similar changes ([3] becomes [0] and iowait becomes USR). You should probably also change the variables up front - you really don't want to be notified if the usr percentage is high for, say, ten seconds. That just means you're doing your nightly cron jobs! I'd use something like this:

$IOWAITTHRESHOLD = 70; # The percentage required to trigger alert
$TIMETHRESHOLD = 60*8; # Length of time in seconds for a high iowait to trigger$$EMAILTIMER = 60*120; # Length of time in seconds between alert emails
$SUBJECT = "CPU busy alert!"; # Subject of alert message

Adding a second e-mail contact

If you want to add a second e-mail contact — say, your cellphone’s messaging system — you can change this line:

our($EMAILADDRESS, $IOWAITTHRESHOLD, $TIMETHRESHOLD, $EMAILTIMER, $SUBJECT, $SMTPSERVER, $FROM, $IOSTAT);

to:

our($E2, $EMAILADDRESS, $IOWAITTHRESHOLD, $TIMETHRESHOLD, $EMAILTIMER, $SUBJECT, $SMTPSERVER, $FROM, $IOSTAT);
$E2 = "yourphone\@tmomail.com";

and then go to the bottom of the script, where you find sub(email), and change it from its current form to:

sub sendemail
{
my($sender, $body);
$body = $_[0];
$sender = new Mail::Sender {smtp => "$SMTPSERVER", from => "$FROM"};
$sender->Open({to => $EMAILADDRESS, subject => $SUBJECT});
$sender->SendLineEnc("$body\n");
$sender->Close();

$body = $_[0];
$sender = new Mail::Sender {smtp => "$SMTPSERVER", from => "$FROM"};
$sender->Open({to => $EMAIL2, subject => $SUBJECT});
$sender->SendLineEnc("$body\n");
$sender->Close();
}