logo

Event, Incident & Problem Management

Events
happen all the time, in their thousands per minute.
 
Incidents
are unwanted events or missing desired events.
 
Problems
are persistent incidents that you have difficulty fixing.
Step one in Incident Management Opens a new window is to find out what is going on, and make sure the people who need to know are informed. Most software servers have built in reporting, generating vast numbers of records for expected and innocuous events. There are then applications running on those servers: their reporting may be good or indifferent. The trick is to ensure the events that are important and/or unexpected, the incidents, are reported to the right human(s), and that any reporting is accurate and comprehensive.
 
Incidents may be fixed as an 'exception', but when the exception starts becoming a daily routine, then the cost of the hours start to mount up. The problem that is the root cause of these incidents should now be more strongly investigated and ways found to prevent it from happening again.
 

cartoon: furnace console displaying error messages An example: An incident occurred at a manufacturing plant where a night-shift operator was loading a computer-controlled furnace with work. As they were performing their task, they suddenly became aware of heat unexpectedly coming down on their neck, indicating that the furnace heater was moving above them. Despite the operator's best efforts to stop the motor, the heater collided with the containment vessel cradle, shearing the nylon gearing.

As a result of this incident, the furnace was out of operation for two days during a busy period: there would not have been a night-shift running otherwise. This caused major disruptions to the manufacturing schedule.

After the incident, a review was conducted to determine the cause. The review eventually exonerated the operator, finding that the incident was not due to human error but rather a flaw in the furnace programming. It was discovered that if a switch was changed through two positions quickly at a certain stage in the sequence, the program did not register the intermediate change, which would have initiated a "halt" state for the heater in a safe neutral position. The heater would then not have started moving into position until a confirming "go" button was pressed.

The incident review team, having identified the root cause, implemented changes to the furnace programming to prevent similar incidents from occurring in the future.
 

An incident or problem review must focus on potential issues with the process, and look to find ways to strengthen the processes, to make them more robust. It's all too easy to blame the operator. A longer article "Incident Reviews Opens a new window" is published on LinkedIn.
 
Why hire a Service Management Analyst to carry out incident and problem reviews?
For more information on how we can help you with improving incident & problem management, including carrying out an ad hoc review, please write to robert@esm.solutions.