When it comes to alarm management, many in the process industry restrict their attention to the simple resolution of noise or taming bad actors. While resolving noise and nuisance alarms is important, all too often investigations are limited to a static process that involves resolving the same two or ten or hundred noisy alarms, only to have the same issues occurring again weeks, months or even a year later.
Recurring alarm problems signal a broader issue, something that won’t be fixed by simply making the noise stop. Even though the bad actor improvement reduces the noise, as operations or processes evolve and equipment degrades the alarm problem will continue to return until the root causes of these chronic problems are investigated. These root causes typically boil down to several issues at the facility.
One of the most common underlying issues is the ‘it’s only one more alarm’ attitude. This prevails in some facilities where alarms have been added to equipment, one alarm on top of another, regardless of whether they meet the definition of an alarm or fit with the alarm philosophy. Oftentimes the alarms simply introduce a redundant alarm of another equipment measurement — such as a second temperature measurement on a pump — which is already measured by the first alarm. Or new alarms are introduced after more instruments (all of which measure the same thing) are added.
Randomly adding new alarms to an industrial process can have cascading impacts. It can set the operator up for failure, especially when the same action is required for the additional alarm as for the first alarm. It requires that the operator track the sequence of alarms, interpret the cause, identify alarms that are not associated with the sequence of alarms, and understand how to respond effectively and quickly to all the alarms that require an action.
Another common root cause of recurring alarm problems is that a thorough review process is missing or not followed. When alarms that are ‘one offs’ are added without appropriate oversight, these typically are not subjected to rationalization and may not meet the criteria of an alarm. The rationalization process needs to be adequately defined in the facility’s alarm philosophy and rigorous enough to analyze for the appropriateness of having an alarm based on the cause, consequence and corrective action required. Each alarm action requires a significant different action from a similar alarm.
The mismanagement of people handling alarms is a common root cause. In designing alarm management systems, it is important that people understand their roles, such as who gets alarms and why, and who decides which alarms end up on the operators’ consoles. For example, facilities can have field staff designing multiple alarms for use by PLC staff who are troubleshooting SCADA or DCS systems, yet operational staff only need a few alarms to signal problems that need to be addressed. Clearly defined roles also ensure consistency over time, regardless of the individual filling a particular role.
Finally, changes in personnel, policy or equipment can undermine an effective alarm management system. Incoming personnel can start making decisions about alarms that fall outside of an existing alarm philosophy. Or companies introduce new policies based on the assumption that more is better, adding alarms to ‘err on the side of the safety.’ A further issue is the installation of new equipment like package boilers, which arrive complete with the in-built alarms recommended by the manufacturer, all of which will be made available to operators.
Alarm management is not simply about putting out individual fires but aligning each and every alarm with a rigorous and dynamic alarm philosophy. It needs to be strategic, comprehensive, and based on a deep-rooted understanding of the industrial process and the clearly defined roles of people within that process. Given the inevitability of changes at the operational and organizational levels, alarms must also be managed effectively over time if they are to fulfill their function and prevent the events that lead to shutdowns, production loss or even safety incidents.