Friday, October 20, 2017

nagios passive check + custom script at remote host

This is the 2nd post for the custom script.
Due to my new environment where i had limited access & yet i still wanna pass some server information back to nagios for monitoring and alert + without compromising security issue

Nagios server || Firewall ||  production server

the only port open between this 2 server is SSH port, so i will need to utilize this to send data from my server back to Nagios server

In this post, will be all my notes on my custom script to check disk space, memory, CPU load and checking some service if it running / stopped.

Setup

I assume nagios server is done setup and running perfectly good.
otherwise, please check below URL for how to setup nagios
http://gab-tech.blogspot.my/2012/08/setup-nagios.html

although the post is kinda old, but the setup should be same.
for this post, i am using Nagios Core 4.2.4

Now you need to create entry at nagios server
you can edit your current config or create a new config.
for mine, i create a new config for every project group for easy manage

# vim Hybris.cfg

define hostgroup{
        hostgroup_name  HYBRIS-DEV
        alias           HYBRIS-DEV
        members         HYBRIS-APP-D01
                }

define host{
        use                     linux-server
        host_name               HYBRIS-APP-D01
        alias                   HYBRIS-APP-D01
        address                 HYBRIS-APP-D01.gab.com
        notification_interval   0
        }

define service{
        use                             local-service
        host_name                       HYBRIS-APP-D01
        service_description             /home
        check_command                   check_log
        notifications_enabled           1
        notification_interval           0
        passive_checks_enabled     1        }



Dummy script

from the nagios setup, can see the check_command i use is pointint to check_log
there are no plugin call check_log actually, it just a dummy script to satisfy nagios. Because if i didnt set check_command, nagios will give error.
and my custom script is at different server.

open and edit this file
# vim command.cfg

put this into it

# 'fake command' command definition
define command{
        command_name    check_log
        command_line    /bin/bash /usr/local/nagios/script/check_passive
        }

then at nagios folder create script directory
and create check_passive with 770 permission
put this into it

#!/bin/sh
echo "please disable active check and use passive"
exit 1

restart nagios server
# /etc/init.d/nagios restart

you can issue nagios configtest to check configuration before restart if it got any error
# /etc/init.d/nagios configtest

Manual push result to Nagios from remote host

From nagios documentation, we can use this command to push result into nagios

[<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>

Example of mine:
echo "[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-P01;/home;1;test output" >> nagios.cmd

host_name = GAB-APP-P01
svc_description = /home
return_code = 1
plugin_output = test


then using this coomand, we can manual push the result from our custom script back to nagios.
to test if it is working, you can initiate this command to test

ssh -t nagios@nagios_server_IP "
     cd /usr/local/nagios/var/rw
     echo '[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-D01;/home;1;test' >> nagios.cmd"
you should be able to see the result in your nagios server


Setup Remote host script

because I dont want all checking under 1 script, so i separate out to few script
1. check storage script
2. check cpu script
3. check memory script
4. check service running script

then to avoid duplicate code of push data back to nagios server, i separate out another script for purely send data back to nagios

5. push data to nagios server script

NOTE: For security issue
I not going to use root to push data back to nagios, i create a user cal nagios.
then i create ssh-keygen for nagios and put to nagios server so everytime it push data back to nagios server, it can skip password authentication part.

For how to setup SSH-keygen, please refer to this link below for setup ssh-keygen
http://gab-tech.blogspot.my/2011/03/incremental-backup.html


here is the example script i use at remote host
PS: at nagios user home dir, i created script directory and store all my script there

5. Push data to nagios server script

edit the RED color word to suit your server
---------- nagios.sh ----------
#!/bin/bash

ssh -t nagios@NAGIOS_SERVER_IP "
     cd /usr/local/nagios/var/rw
     echo '[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-D01;$1;$2;$3' >> nagios.cmd"
--------- END ----------

1. Check Storage Script

In order to avoid keep repeat issue df -h command for each checking,
i set cronjob to record down df -h result to a file

# record every 5 minute to df-result
*/5 * * * * df -h > /home/nagios/script/df-result

---------- check_storage.sh ----------
#!/bin/bash

# all script located here
cd /home/nagios/script

# delay 30 sec before start check so it can confirm wont crash with cronjob record result
sleep 30s

store1="/"
result1=$(grep -w "/" df-result | awk '{print $4}')
status1=$(bash status.sh $result1)
/bin/bash nagios.sh $store1 $status1 $result1

store2="/boot"
result2=$(grep -w "/boot" df-result | awk '{print $5}')
status2=$(bash status.sh $result2)
/bin/bash nagios.sh $store2 $status2 $result2

store3="/home"
result3=$(grep -w "/home" df-result | awk '{print $4}')
status3=$(bash status.sh $result3)
/bin/bash nagios.sh $store3 $status3 $result3
---------- END ----------

2. check cpu script

---------- cpu_load.sh ----------
#!/bin/bash

sar=$(sar 1 1 | tail -n 1 | awk '{print $8}')

load=`echo "100.00-$sar" | bc`

if [[ $load == .* ]]
   then load=$(echo "0$load")
fi

if (( $(echo "$load < 80" | bc -l) )); then
        status=0
elif (( $(echo "$load > 90" |bc -l) )); then
        status=2
elif (( $(echo "$load > 80" | bc -l) )); then
        status=1
else
        status=3
fi

load=$(echo $load%)

cd /home/nagios/script
/bin/bash nagios.sh cpu $status $load
---------- END ----------

3. check memory script

This check memory script is only for redhat/centos 6 and above
---------- memory_V6.sh ----------
#!/bin/bash

total=$(free -m | grep "Mem:" | awk '{print $2}')
used=$(free -m | grep "buffers/cache" | awk '{print $3}')

#echo $total
#echo $used

percentage100=$[$used*100]
percentage=$[percentage100/$total]

result=$(echo $percentage%)

#echo $result
cd /home/nagios/script
status=$(bash status.sh $percentage)
/bin/bash nagios.sh memory $status $result
---------- END ----------


4. check service running script

---------- hybris_service.sh ----------
#!/bin/bash

sleep 15s

cd /var/log/nagios/script

HYBRUNNING=`ps auxwww | grep hybris | grep "jmxremote" | grep -v grep | wc -l`

if [ ${HYBRUNNING} -ne 0 ]; then
   result=running
   status=0
else
   result=stop
   status=2
fi

/bin/bash nagios.sh Hybris-service $status $result

---------- END ---------


CRONJOB

set cronjob to run this script every 5 min

*/5 * * * * /home/nagios/script/check_storage.sh > /dev/null 2>&1
*/5 * * * * /home/nagios/script/hybris_service.sh > /dev/null 2>&1
*/1 * * * * /home/nagios/script/memory_V6.sh > /dev/null 2>&1
*/1 * * * * /home/nagios/script/cpu_load.sh > /dev/null 2>&1



reference:
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/passivechecks.html
https://somoit.net/nagios/nagios-using-passive-checks-without-agent