Linux: CPU load monitoring…
Some of you have servers, some of you work in companies that have server and other just have home servers. Watching the server 24/7 is a hard job, well mainly because you have to be there all the time. The CPU load of the server is one of the most important thing, to know how the application are working, which is eating your CPU and so on. There are a lot of monitoring tools out there, but also this tools tend to eat the CPU a bit.
Not so long ago, I come across a small script which was design to send a mail in case the CPU load was going to high, base on a “max load” variable. The script is simple, but also with a big problem. Check if the current CPU load is bigger then “max load allowed” was done by comparing 2 strings, which is really wrong, because if you have the “max load allowed” at a value of 50 for example and the current load at lets say 48.01, this actually means that current load is bigger then “max load allowed”, but why?! Because as this 2 values are compared as strings, they actually don’t get compared as value, but as string length, so current load is bigger then “max load allowed”.
So you have X=50 and Y=48.01 and you compared them like this: if [ "$X" < "$Y" ]; then… this will end-up in the situation that Y is bigger then X, as the length of Y is bigger. The string “48.01″ is longer in then string “50″.
Here is a test:
[root@viperhost /root]# export X=”50″
[root@viperhost /root]# export Y=”48.01″
[root@viperhost /root]# echo “length of X: ${#X}, length of Y: ${#Y}”
length of X: 2, length of Y: 5
[root@viperhost /root]#
As you can see from the test, the length of Y is bigger then X, this will mean that in a script where you wanna compare numbers, but you compare them using strings you will end-up having a wrong output.
So anyway, that was one part of the “script”, the rest is down. Like I said before, the script was fixed by me, meaning it will compared the values of 2 numbers and not strings, not as it was originally designed.
The code:
#!/bin/sh
AWK=$(which awk)
SAR=$(which sar)
GREP=$(which grep)
TR=$(which tr)
HEAD=$(which head)
PS=$(which ps)
SORT=$(which sort)
HOSTNAME=$(hostname -f)
SED=$(which sed)
LOAD=49
CAT=$(which cat)
MAILFILE=/tmp/mailviews$$
MAILER=$(which mail)
mailto=”somemail@domain.org”for path in $PATH;
do
CPU_LOAD=$($SAR -P ALL 1 2 | $GREP ‘Average.*all’ | $AWK -F” ” ‘{ print 100.0 -$NF}’|awk ‘{$1=sprintf(“%.0f”,$1)} {print}’)
if [ $CPU_LOAD -gt $LOAD ]; then
PROC=$($PS -eo user,pcpu,pid -o comm= –no-heading| $SORT -k2 -n -r | $HEAD -4)
echo $PROC
echo “Please check your processes on ${HOSTNAME} the value of cpu load is $CPU_LOAD” > $MAILFILE
echo “” >> $MAILFILE
echo -e “USER\tCPU\tPID\tPROC” >> $MAILFILE
echo -e “$PROC\n” >> $MAILFILE
$CAT $MAILFILE | $MAILER -s “CPU Load is $CPU_LOAD % on ${HOSTNAME}” -r sysmon@viperhost.net $mailto
rm -f $MAILFILE
fidone
exit 0
Now you can copy this script in any file you want, save it with the name that you want and then add it to crontab. You can add this script to run every 10 minutes. Do not forget, you need to have installed on your system sysstat “sar”, to be able to actually get the script functional and also the mail server has to run to be able to send the notification in case the CPU load goes over the “LOAD” value which in the script now is “49″. You can modify the value of “LOAD” to anything that you want or need, depends on your system and in some case even on the CPU’s, how may you have.
The code:
*/10 * * * * ( /path/to/the/script/scriptname ) >/dev/null 2>&1
Good luck guys, let me know how it works or if it really helps…
This entry was posted by robert on January 14, 2009 at 09:31, and is filed under Linux. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site.