Tag Archives: nagios

Learning more about Nagios for server monitoring

This week’s TechMail is Learning more about Nagios for server monitoring which reviews the book Learning NAGIOS 3.0 published by Packt Publishing. This is a pretty decent book for anyone interested in learning more about Nagios. It taught me, who’s been using Nagios for some time, a few new tricks. Not necessarily a good read, but a darn fine reference manual (although reading from front to back would be good for someone who really wants to get up to speed on the full power of Nagios quickly). Read the TechMail for the full review.

hddtemp wrapper for Nagios

I was bored tonight so I wrote a wrapper for hddtemp for Nagios monitoring. I have a bit of a quirky setup for Nagios where I run the local system checks on remote systems via netcat, ipsvd, and a script to handle the query. This allows me to monitor remote drive space, current users, total processes, and current load. Using hddtemp, I can now monitor the temperature of the drives in those machines (which also gives me an idea of how hot/cold the server room itself is).

This may need some tweaking to work with other Nagios setups, but shouldn’t be too hard to adapt. One of these days I’ll do a writeup on my Nagios configuration. Anyways, the wrapper script is as follows. It could probably be optimized a bit more, but it works well enough. WordPress doesn’t handle the indents very well, so keep that in mind.

#!/bin/sh

usage() {
    echo "${0} -w [warn] -c [crit] [drives]"
}

if [ "${1}" == "-h" -o "${1}" == "--help" ]; then
    usage
    exit 0
fi
if [ "${1}" == "-w" ]; then
    shift
    warn="${1}"
    shift
else
    usage
    exit 1
fi
if [ "${1}" == "-c" ]; then
    shift
    crit="${1}"
    shift
else
    usage
    exit 1
fi
while [ "${1}" != "" ]; do
    drives="${drives} ${1}"
    shift
done
if [ "${drives}" == "" ]; then
    usage
    exit 1
fi

status=0
smsg=""
htemp=0

for drive in ${drives}; do
    msg=""
    stats=`/usr/local/sbin/hddtemp ${drive}`
    model=`echo ${stats} | cut -d ':' -f 2`
    temp=`echo ${stats} | cut -d ':' -f 3 | cut -d ' ' -f 2`
    dev=`echo ${drive}|cut -d '/' -f 3`

    if [ "${temp}" -ge "${warn}" ]; then
        if [ "${status}" != "2" ]; then
            status=1
        fi
    fi

    if [ "${temp}" -ge "${crit}" ]; then
        status=2
    fi

    if [ "${temp}" -gt "${htemp}" ]; then
        htemp="${temp}"
    fi

    smsg="${smsg}${dev}=${temp}C; "
done

case "${status}" in
    2)
        wmsg="CRITICAL"
        ;;
    1)
        wmsg="WARN"
        ;;
    0)
        wmsg="OK"
        ;;
esac

echo "HDDTEMP ${wmsg} - ${smsg}|hddtemp=${htemp};${warn};${crit};0"

The output, in Nagios’ status view looks like:

HDDTEMP OK - hda=22C: sda=24C: sdb=24C:

It’s called as “hddtemp-mon -w 30 -c 35 /dev/hda /dev/sda /dev/sdb”.