Nagios: Example of how to make an oncall rotation

Nagios is great for monitoring systems; however, you will need to notify the correct groups or contacts. The basic type of notification period would be your on-call personnel. These are the people who keep the enterprise running when others are enjoying their off hours. So, you will want to be VERY careful about how you set this up!

You will first need to identify the services and the contact group, which will receive the notifications after-hours. Remember, check_host_alive is a service and a very easy one to begin with for our example, because check_host_alive will generally report to the Systems Administration team after hours. So, we have identified the service, check_host_alive, and the contact group, Systems Administration.

Many organization’s already have an on-call rotation for the Systems Administrators. This means you want to to make a list of the on-call rotation as it currently exists. Once you make the on-call list and the days and hours they are to receive notifications you are set for the next part of this example.

We will be working with three files here: timeperiods, contacts and contactgroups. Now, if your file structure was created per the Nagios manual, you should have a timeperiods and contacts directory. So we will be adding these files to the two directories.

Timeperiods: We want to keep the timeperiods organized for easy modification and additions. Since many organizations have both Linux and Window’s Administrators we will call the file: linux-admins-oncall.cfg. We only have two admins that rotate every other week, if your rotation is larger just multiply 7x the number of people in the on-call rotation. Open the empty file and add your on-call personnel in the order they are on-call:

define timeperiod{
timeperiod_name         admin1-oncall
alias                            Admin1 On Call
2009-04-17 / 14            15:00-24:00
2009-04-18 / 14            00:00-24:00
2009-04-19 / 14            00:00-24:00
2009-04-20 / 14            00:00-24:00
2009-04-21 / 14            00:00-24:00
2009-04-22 / 14            00:00-24:00
2009-04-23 / 14            00:00-24:00
2009-04-24 / 14            00:00-15:00
}
define timeperiod{
timeperiod_name        admin2-oncall
alias                Admin2 On Call
2009-04-24 / 14            15:00-24:00
2009-04-25 / 14            00:00-24:00
2009-04-26 / 14            00:00-24:00
2009-04-27 / 14            00:00-24:00
2009-04-28 / 14            00:00-24:00
2009-04-29 / 14            00:00-24:00
2009-04-30 / 14            00:00-24:00
2009-05-01 / 14            00:00-15:00
}

Once you have added these two you can save and close the file. Next we want to work with the contactgroups.cfg file. You can break this file down for more clarity; however, we just use a single file. So, vi the file for editing and add:

define contactgroup{
contactgroup_name       linux_admins
alias                   Linux Admins
members                 admin1, admin2
}

Save and close the file. Next we want to create the contacts file. We will call this file linux_admins_beep.cfg and place it in the contacts directory, since we want to contact the Admins through his or her blackberry. Vi linux_admins.cfg and add the following:

define contact{
contact_name                    admin1.on.call
alias                           Linux Admin1
contact_groups           linux_admins
host_notifications_enabled      1
service_notifications_enabled   1
service_notification_period     admin1-oncall
host_notification_period        admin1-oncall
service_notification_options    c,r
host_notification_options       d,r
service_notification_commands   notify-service-by-beeper
host_notification_commands      notify-host-by-beeper
email                           admin1@yourdomain
can_submit_commands             1
}
define contact{
contact_name                    admin2.on.call
alias                           Linux Admin2
contact_groups            linux_admins
host_notifications_enabled      1
service_notifications_enabled   1
service_notification_period     admin2-oncall
host_notification_period        admin2-oncall
service_notification_options    c,r
host_notification_options       d,r
service_notification_commands   notify-service-by-beeper
host_notification_commands      notify-host-by-beeper
email                         admin2@yourdomain.com
can_submit_commands             1

}

Save and close the file. Now, if everything was done correctly you should be able to reload Nagios, /etc/init.d/nagios reload, and you should be good. Now, you will need to add “linux_admins” to the contact groups for each host you want to notify the linux admins when there is a problem.

Note: If you are going to be notifying an administrator via a blackberry device, make sure your message is as specific as posible, i.e. If you are going to send the hostname then there is no need for the IP address. I would also only send critical and host down messages.

A very good source is the Learning Nagios 3.0 book. I provided a link to the book on the right side of the page.
I hope this helps someone and enjoy,

Mike Kniaziewicz,MIS

2 Responses to “Nagios: Example of how to make an oncall rotation”

  1. Kuljit says:

    If I understand this correctly, can I follow this configuration in my case where US and UK team can respond to alerts rotating between them (US covering day shift; UK graveyard) ?
    Thanks

  2. Mike Kniaziewicz says:

    Correct. A consideration is where the Nagios host is located. Timeperiods will be based upon the system time.