Due to my new environment where i had limited access & yet i still wanna pass some server information back to nagios for monitoring and alert + without compromising security issue
Nagios server || Firewall || production server
the only port open between this 2 server is SSH port, so i will need to utilize this to send data from my server back to Nagios server
In this post, will be all my notes on my custom script to check disk space, memory, CPU load and checking some service if it running / stopped.
Setup
I assume nagios server is done setup and running perfectly good.otherwise, please check below URL for how to setup nagios
http://gab-tech.blogspot.my/2012/08/setup-nagios.html
although the post is kinda old, but the setup should be same.
for this post, i am using Nagios Core 4.2.4
Now you need to create entry at nagios server
you can edit your current config or create a new config.
for mine, i create a new config for every project group for easy manage
# vim Hybris.cfg
define hostgroup{
hostgroup_name HYBRIS-DEV
alias HYBRIS-DEV
members HYBRIS-APP-D01
}
define host{
use linux-server
host_name HYBRIS-APP-D01
alias HYBRIS-APP-D01
address HYBRIS-APP-D01.gab.com
notification_interval 0
}
define service{
use local-service
host_name HYBRIS-APP-D01
service_description /home
check_command check_log
notifications_enabled 1
notification_interval 0
passive_checks_enabled 1 }
Dummy script
from the nagios setup, can see the check_command i use is pointint to check_log
there are no plugin call check_log actually, it just a dummy script to satisfy nagios. Because if i didnt set check_command, nagios will give error.
and my custom script is at different server.
there are no plugin call check_log actually, it just a dummy script to satisfy nagios. Because if i didnt set check_command, nagios will give error.
and my custom script is at different server.
open and edit this file
# vim command.cfg
put this into it
put this into it
# 'fake command' command definition
define command{
command_name check_log
command_line /bin/bash /usr/local/nagios/script/check_passive
}
then at nagios folder create script directory
and create check_passive with 770 permission
and create check_passive with 770 permission
put this into it
#!/bin/sh
echo "please disable active check and use passive"
exit 1
restart nagios server
# /etc/init.d/nagios restart
you can issue nagios configtest to check configuration before restart if it got any error
# /etc/init.d/nagios configtest
Manual push result to Nagios from remote host
From nagios documentation, we can use this command to push result into nagios[<timestamp>] PROCESS_SERVICE_CHECK_RESULT;<host_name>;<svc_description>;<return_code>;<plugin_output>
Example of mine:
echo "[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-P01;/home;1;test output" >> nagios.cmd
host_name = GAB-APP-P01
svc_description = /home
return_code = 1
plugin_output = test
then using this coomand, we can manual push the result from our custom script back to nagios.
to test if it is working, you can initiate this command to test
ssh -t nagios@nagios_server_IP "you should be able to see the result in your nagios server
cd /usr/local/nagios/var/rw
echo '[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-D01;/home;1;test' >> nagios.cmd"
Setup Remote host script
because I dont want all checking under 1 script, so i separate out to few script1. check storage script
2. check cpu script
3. check memory script
4. check service running script
then to avoid duplicate code of push data back to nagios server, i separate out another script for purely send data back to nagios
5. push data to nagios server script
NOTE: For security issue
I not going to use root to push data back to nagios, i create a user cal nagios.
then i create ssh-keygen for nagios and put to nagios server so everytime it push data back to nagios server, it can skip password authentication part.
For how to setup SSH-keygen, please refer to this link below for setup ssh-keygen
http://gab-tech.blogspot.my/2011/03/incremental-backup.html
here is the example script i use at remote host
PS: at nagios user home dir, i created script directory and store all my script there
5. Push data to nagios server script
edit the RED color word to suit your server---------- nagios.sh ----------
#!/bin/bash
ssh -t nagios@NAGIOS_SERVER_IP "
cd /usr/local/nagios/var/rw
echo '[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;GAB-APP-D01;$1;$2;$3' >> nagios.cmd"
--------- END ----------
1. Check Storage Script
In order to avoid keep repeat issue df -h command for each checking,i set cronjob to record down df -h result to a file
# record every 5 minute to df-result
*/5 * * * * df -h > /home/nagios/script/df-result
---------- check_storage.sh ----------
#!/bin/bash
# all script located here
cd /home/nagios/script
# delay 30 sec before start check so it can confirm wont crash with cronjob record result
sleep 30s
store1="/"
result1=$(grep -w "/" df-result | awk '{print $4}')
status1=$(bash status.sh $result1)
/bin/bash nagios.sh $store1 $status1 $result1
store2="/boot"
result2=$(grep -w "/boot" df-result | awk '{print $5}')
status2=$(bash status.sh $result2)
/bin/bash nagios.sh $store2 $status2 $result2
store3="/home"
result3=$(grep -w "/home" df-result | awk '{print $4}')
status3=$(bash status.sh $result3)
/bin/bash nagios.sh $store3 $status3 $result3
---------- END ----------
2. check cpu script
---------- cpu_load.sh ----------#!/bin/bash
sar=$(sar 1 1 | tail -n 1 | awk '{print $8}')
load=`echo "100.00-$sar" | bc`
if [[ $load == .* ]]
then load=$(echo "0$load")
fi
if (( $(echo "$load < 80" | bc -l) )); then
status=0
elif (( $(echo "$load > 90" |bc -l) )); then
status=2
elif (( $(echo "$load > 80" | bc -l) )); then
status=1
else
status=3
fi
load=$(echo $load%)
cd /home/nagios/script
/bin/bash nagios.sh cpu $status $load
---------- END ----------
3. check memory script
This check memory script is only for redhat/centos 6 and above---------- memory_V6.sh ----------
#!/bin/bash
total=$(free -m | grep "Mem:" | awk '{print $2}')
used=$(free -m | grep "buffers/cache" | awk '{print $3}')
#echo $total
#echo $used
percentage100=$[$used*100]
percentage=$[percentage100/$total]
result=$(echo $percentage%)
#echo $result
cd /home/nagios/script
status=$(bash status.sh $percentage)
/bin/bash nagios.sh memory $status $result
---------- END ----------
4. check service running script
---------- hybris_service.sh ----------#!/bin/bash
sleep 15s
cd /var/log/nagios/script
HYBRUNNING=`ps auxwww | grep hybris | grep "jmxremote" | grep -v grep | wc -l`
if [ ${HYBRUNNING} -ne 0 ]; then
result=running
status=0
else
result=stop
status=2
fi
/bin/bash nagios.sh Hybris-service $status $result
---------- END ---------
CRONJOB
set cronjob to run this script every 5 min*/5 * * * * /home/nagios/script/check_storage.sh > /dev/null 2>&1
*/5 * * * * /home/nagios/script/hybris_service.sh > /dev/null 2>&1
*/1 * * * * /home/nagios/script/memory_V6.sh > /dev/null 2>&1
*/1 * * * * /home/nagios/script/cpu_load.sh > /dev/null 2>&1
reference:
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/passivechecks.html
https://somoit.net/nagios/nagios-using-passive-checks-without-agent