Your Major Incident Management Procedure

wouter wyns

You probably know Murphy’s famous law and maybe you have already experienced it: “When things go wrong, anything that can go wrong will go wrong”. There is one thing you should definitely have in place when “anything that can go wrong goes wrong” and that’s your Major Incident Management Procedure.

A Major Incident

When things go wrong, we call this an incident in service management. An incident becomes a Major Incident when it goes seriously wrong. This is when the business is deprived of an essential service or, in other words, when a business-critical service is down. In general, it is not hard to recognize a Major Incident: the service desk will be flooded with calls and tickets that are all related to the same event. Managers will soon become nervous and add some extra pressure on the support organization. A major incident often comes with stress, frustration and even rage. These negative emotions are not the best advisors when you need to make fast decisions and communicate effectively with everyone involved.  And this is precisely what these situations require. That’s why it is essential to define and agree on a Major Incident Management procedure before the disaster happens.

A Major Incident Management Procedure

A Major Incident Management Procedure describes in detail the steps you need to perform when a major incident happens. The main idea is to create a context in which the engineers in question can fully concentrate on the task of resolving the issue. This requires coordination, and that’s the responsibility of the incident manager. In the first place, the incident manager will create a Major Incident Management Team with the best engineers to analyze the incident. Often such a team consists of engineers from different disciplines. The incident manager will take these specialists out of their daily routine and set up a communication bridge to get these engineers on the same call so they can decide on an action plan. Another task for the incident manager is to ensure communication is sent to all stakeholders so that the end users will understand that the service delivery organization in question is aware of the major incident and that they are trying to fix the issue as soon as possible. This will prevent the service desk from being overloaded with incoming calls. Communication to the management of the service delivery organization requires special care. Management will need more detailed info because they must be able to explain what is happening.  

During the outage, the communication steps should be repeated at regular intervals until the incident is resolved. Once the incident has been resolved, the incident manager will perform a post-incident review and examine together with the Major Incident Management Team whether the incident could have been handled better.

How can the 4me platform help?

To accommodate this, the 4me platform allows every specialist to let their organization’s incident managers know that an incident should probably be treated as a major incident.  After receiving the initial notification, each incident manager can reject the proposal by updating the major incident status to ‘Rejected’.  Alternatively, an incident manager may decide to update the major incident status from ‘Proposed’ to ‘Accepted’ to turn the incident into a major incident. They will then follow the steps described in the Major Incident Management Procedure. This document must contain all the details: how to send out the communication, to whom, templates and examples, where to find the distribution lists, etc. The procedure should consider that some technology may fail due to the outage of a part of the infrastructure and describe alternatives. This will make sure that when the worst happens, your organization is still in control.

When things go seriously wrong, people tend to panic—and mess up things they usually do with ease. A detailed major incident management procedure is your best guardian angel and can be a lifesaver in these situations. And the 4me platform makes sure you can embed the Major Incident Management Procedure in your daily routine.