Nagios: Proactive Host and Service Checks

Your monitoring with Nagios should always be evolving. You may start out with a few hosts running a PING command and monitor only a few services. Needless to say, you will not start with a monitoring services that does everything your enterprise requires. So, you need to be aware of how to contribute to the overall success of the project.

One method is through observations. Take for instance a host with five services being monitored. You start to receive email that all the services are now critical with an error message of NRPE: daemon timed out; however, the host is still showing an UP status. What does that mean to an administrator or system monitoring? Let’s examine the problem using basic troubleshooting:

  1. The network is up because we are \table to ping the host.
  2. Try to log into the host. If you cannot log into the host host, then the host is unresponsive and will need to be entered through a management console or power cycled.
  3. If you are able to log into the host and the NRPE daemon was working properly before, chances are very good you will need to restart the xinetd or inetd service on the host. No need to wake the World, because the host was still performing its designated services to the enterprise.

Another way to be proactive is checking the commands on the host. Say you receive a request to monitor a process on a server and you are not sure what Nagios can do to help with that task. You can run the commands locally with their various flags to see the results. All commands are located under the /libexec directory and since they are only requesting information you will not have to be root to run them.

The third and most important method for being proactive with Nagios is to listen and relay monitoring concerns to the Nagios Administrators that can rectify the problem. The reason for not making changes on your own is that Nagios can start to experience problems with too many people making modifications to the configuration files.

Say you have two people that are working together to improve service and host monitoring within Nagios and they have modified 2 out of 3 files. Well, a third person just wants to make a quick modification to a command so they jump on the Nagios server, make the configuration change and reload’s Nagios. The third person is going to receive an error message because Nagios is no longer configured properly due to the 3rd file not being complete from the group’s work.

Now, the worse case scenario could occur at this time where the third person starts to change Nagios files to fix the error they believed was created. Well, we can see how this can start to balloon. Always remember that the Nagios System is in production and modifications can cause undocumented features to appear in Nagios if too many people are performing the work at one time.

Changes should only to be made during working hours after the appropriate email has been sent to all concerned parties. Don’t be the one that wakes and Administrator in the middle of the night because you saw a change that would be really nice to have in Nagios.

Enjoy and the key word here is Accountability,
Mike Kniaziewicz, MIS

Comments are closed.