Nagios: service and host escalations made simple

You are asked to escalate a down host or service to either another technical level or add an incident to your Change Management System. What do you do? Nagios has a great object attribute called escalations for you to configure for the purpose of escalating host and or service issues.

If you already have an existing Nagios host and service monitoring and notification system established, you will be up and running in two steps:

  1. Add any new contacts and or contact groups.
  2. Add the escalation configuration.

I will explain how I added an escalation to notify a Change Management System on the first notification. Once the Change Request is sent you will no longer have to create any more problem tickets, since the technician working the problem should do two things: Acknowledge the problem through Nagios and work the problem ticket.

  1. Add any new contacts for the escalation. I had to add a contact for the Change Management System, since we are able to open an incident using email.
  2. name-of-your-contact-file.cfg

    define contactgroup{
    	contactgroup_name		name of your cms
    	alias				Name of Your CMS
    	members				name_of_your_cms
    	}
    define contact{
            contact_name                    name_of_your_cms
            alias                           Name_of_Your_CMS
    	contact_groups			name_of_your_cms
            host_notifications_enabled      1
            service_notifications_enabled   1
            service_notification_period     24x7
            host_notification_period        24x7
            service_notification_options    c #We only need a problem ticket open when the service is critical
            host_notification_options       d #We only need a problem ticket open when the host is down
            service_notification_commands   notify-linux-service-by-email
            host_notification_commands      notify-linux-host-by-email
           email                           email_address@your_domain.com
            can_submit_commands             1
            }
  3. Add the escalation.
  4. Name_of_your_escalation.cfg

     define hostescalation{
     	hostgroup_name		name_of_your_hostgroup
     	first_notification	1
     	last_notification	1
     	notification_interval	5
     	contact_groups		name_of_your_cms_group
    	escalation_period	timeperiod_to_notify_create_incident #ex. 24x7
    	escalation_options      d #We want a problem ticket created when the host is down.
     	}

Now, you need to remember how a service escalation works. The escalation is read into Nagios during the reload. When the escalation definition completes, Nagios is smart enough to start with the notification attributes defined in your host and or service template. So, in this instance Nagios will perform the escalation definition and notify your Change Management System once (1). Nagios will continue notifications based upon your notification definitions within the template.

Like everything in life there is a catch. If you create an escalation with a contact group defined within your template, Nagios will only execute the escalation file. For example, if you have a tech_email contact group in the host or service definition and you add it to the CMS contact group within the escalation template both groups will only be notified once (1) when a host or service is down or critical.

That is all there is to creating an escalation for a single purpose. Now, reload Nagios and the escalation will take effect. For service escalations you will be adding a service_definition. Play around with your escalations until you have the correct combination of attributes.

I recommend you add your email address to both groups and reload Nagios. Then select the Notification item in your web-based application. The notification section will show you the contact groups being notified. What I do is add a boggus host definition, add the host group to the escalations file, then reload Nagios.

Have fun and leave a comment if you have any questions or other suggestions for host and service escalations in Nagios.

Comments are closed.