I ran into a problem today. I had just activated a check_ping command for stores on a national level. The number of stores was around 800. I noticed that I started to receive some down hosts; however, when I manually pinged them they were fine.
The check_ping command is as such: “check_ping -H <hostname> -w <wrta>, <pl%> -c <crta>, <pl%> -p <number of packets> -t <time>”
Well, unless you have a direct connection from the Nagios server to the store you need to factor in objects that are out of your control. An example would be all the routers and or frames between your location and the store. Unless you own them, there is really nothing you can do to speed them up. So let’s look at what we are really after.
What we are really after is if the store at the other end of our ping command is able to respond or not. Remember, a ping command only tells you that the NIC card at the other end is able to respond and not if the operating system is working properly. So, you need to work with the RTA flag.
Since the ping command requires a RTA flag you might as well set them high to compensate for the latency between the Nagios server and the store, because you should have other services running as well to check the servers at the store level.
Now, since you have set the RTA value high, I would recommend using a packet count of 1 “-p 1.” Odds are in your favor that a remote host is down by not returning the packet at all than by a varying RTA.
Thoughts?
Mike Kniaziewicz, MIS