Do you have alarm floods?
Recently there has been a lot of discussion about avoidance of alarm floods- the holy grail of alarm management. Despite the efforts of hundreds of smart engineers and scientists in eliminating alarm floods, they still plague operators. In my opinion, it is because we keep trying to eliminate it, rather than just trying to help the operator deal with it.
I recently attended a presentation where the discussion was mainly focused on alarm rationalization. One of the reasons given for alarm rationalization was to reduce the number of floods. There was disagreement about what that meant, and several opinions offered. Prior to that I had been at another meeting where half of the group seemed to think they were in a state of constant flood, while the other half seemed to think that they had no alarm issues at all. At this point, I realized that perhaps everybody does not measure a flood using the same metrics. So, that gets us to the root of today's subject - just what is a "flood"?
According to EEMUA guide, a flood is anything exceeding 10 alarms in 10 minutes - or one per minute. I have looked at enough data to tell you many units are in constant flood by that measurement. However, the operating team does not perceive themselves as being in a flood condition. It is simply their normal alarm load. That team considers a flood as a group of alarms that they simply don't have time to deal with before something bad might happen. In other words, alarms associated with an upset condition, an equipment malfunction, or a process wobble, and perhaps a potential incident in the making. Often, a flood consists of unfamiliar alarms since they are outside of normal operating conditions and a range of normal training/familiarity.
That leads us to the unusual conclusion that "flood" as specified by data is not necessarily a flood to the operator. The operators know why the alarms are there, and they are essentially ignoring them because he knows they will not lead to anything significant. Some of them may even be there because they elect to leave them active for any number of reasons. In fact, once innured to the sound of the alarm annunciator, he is so comfortable with these alarms that he'd probably feel uncomfortable if they were not there. It's kind of like the friend with the overly talkative spouse. Did you ever notice how they were able to function through the incessant prattle despite the fact that it was starting to drive you up the wall? The human mind is capable of ignoring things in this way. This is why in the case of many incidents, ignorance of alarms has been noted as a contributing factor. Ignorance is a strong tool, and is in fact can be a good quality in the right circumstances. Soldiers learn to focus on their goals with bullets flying over their heads. Air traffic controllers learn to analyze situations under circumstances that are increasingly pressurized. Beekeepers learn how to ignore even more.
So, how does this relate to EEMUA guidelines? Simply stated, EEMUA guidelines are not applicable to a unit until that unit has observed, and put into practice ALL of the EEMUA recommendations for alarm management. Paradoxically, if you do not clean up your nuisance alarms, you will not pass the EEMUA spec of normal alarm state, yet your operator may not feel that he is in any state of disrepair. The EEMUA benchmark becomes vital only at the moment you accept the first tenet of the EEMUA documant:: Every alarm has to have an associated operator action. Subject to those conditions, every alarm that occurs beyond the allowable level contributes to a flood. On the flip side, every alarm that does not require action may not be contributory to a flood. From a purist viewpoint, it is, but not necessarily to the operator who is ignoring it. So why all the rush to rationalize alarm systems to reduce flooding? Does it really work?
Our data shows rationalization costs a lot of money, but doesn't necessarily solve the problem. Thus the reason for all of the papers purporting new cost-effective methods. In fact, our experience shows that the most bang for the buck comes when you get rid of all nuisance alarms, and using the information around the others to point you to the situations for which they exist. In other words, it points to Situation Awareness as the cure. This is contrary to all the written expert opinions currently in print.
Much study has been done surrounding this problem by the ASM Consortium (ASM is a registered trademark of Honeywell). Their members were the first to explore alarm rationalization some 15+ years ago. Shouldn't that mean they are now satisfying the EEMUA constraints of the EEMUA 191 report which they co-produced? Their evidence says no.
See the paper they published on this issue.
Yet, their members don't seem to think they have drastic alarm problems, and many feel they have the alarm problem pretty well in hand. Not totally resolved, but in hand. And that's fifteen years after having started. Again that points to the fact that perhaps the problem solution does not lie in all of the things they've tried. Again- I point to a study in Situation Awareness as the required answer. That is the real jewel they have uncovered.
As a short diversion, another path has been taken lately that deserves some attention- tools that will resolve the issue once you have fixed the basics. These tools are based on principles for smart alarming, or state-based alarming. I have met lately with DCS providers and learned that tools are coming which will address this situation in most new DCS systems. I think you'll like them once you see them. Unfortunately, most require system upgrades to recent releases to make use of them. Where does that leave us with respect to legacy systems which will be with us for many years to come?
It has not been TiPS policy to recommend tools that attempt to handle alarm dynamics POST-DCS. There are many reasons for that. Read my post on dynamic alarming to see a few. However, we have seen some tool sets that deserve consideration.
The first is from a company called UReason. See their website at http://www.ureason.com/ . These tools allow for a super-imposed alarm handling screen that "subsumes" alarms, creating a display that makes more sense. For example, it will recognize patterns, and reduce the alarm count automatically by knowing such things as when a pump shuts off, or when a start-up or shut down condition is occurring. Their OASIS system will use pre-designed filters to deliver only the information pertinent to the situation. In other words, those seventeen alarms you once received will be subsumed, and replaced by a single alarm that tells you the pump is down. Or, if the pump is automatically replaced, it may not bother the operator with the information at all - only sending its data to maintenance for service follow-up.
Note that the level of attention and maintenance for such a super-imposed system is increased to levels beyond even the maintenance you have NOT been providing to your alarm system. However, for those who have gone to the trouble of cleaning up their alarms, this could be a next logical step. WARNING!! DON'T TRY IT WITHOUT HAVING CLEANED UP THE BASIC ALARM SYSTEM FIRST. As a voice of experience from one who has tried it both ways, I can tell you that it has no chance of success if you don't do it in the right order. Also, don't try it in conjunction with complex state-estimation models. They have never worked, for a variety of reasons. Handle only the simplest configurations first, increasing your complexity to a level as you see the operations team can support it. The rule here is that if it must be maintained by engineers or mathemeticians, then operators will not receive sustained benefits from it.
To be fair, I should state that I have seen cases where complex model-based systems did work, and were maintained past their initial installation for some period. All of these successful examples share two common traits. The first was that the problem could not have been solved any other way. The second was that the value of solving the problem was so great that it justified the PhD mathemetician who had to be kept on the problem. Note that one other issue was that the PhD mathemetician's interest was also maintained because there were enough twists and turns to the solution that it required his ongoing inventiveness to keep resolving new issues. Perhaps more of these would be in existence if somebody were to offer an inexpensive and efficient expert system package...
We have also seen products from a company in Louisiana called Prosys. http://www.prosys.com/. Having not seen their tools first-hand, I can only guess what they have from descriptions of their products. The approach is similar - a replacement of the DCS alarm screen, but giving a more powerful view to the operator. My understanding is that they have a few examples of these having been installed and maintained for an extended period.
With proper implementation of these tools - not trying to take on the world, but simply to give the operator better information, it is possible that flood management - the holy grail of alarm management - may be within reach. We're certainly heading that way.

Comments