Dynamic Alarming - The Ugly Truth
Today , I want to talk about dynamic alarming, and how it relates to alarm management. It's a hot topic.
Let's suppose you've done a lot of work to get alarms to a point where your control room is quieter. It is. But you still have the upsets- alarm floods and all. You had promised management that the $5MM they spent on alarm management and all the rationalization, etc. was going to solve this problem.
What is the solution? Why- we need smart alarms- or dynamic alarming. The same vendor that promised solved alarm problems has a new solution to offer. We'll make a plant state estimator, and change alarm settings on the fly to reduce alarms as the plant enters different states. Or better yet- we'll be able to avoid alarms entirely, as it predicts what is coming- kind of a better verision of the previously promised dream world where you had reduced the alarms to such a point that your OPERATORS were now able to predict what was going to happen (the so-called "predictive" alarm state- you can read the babble on other companies' web sites- we do not condone the concept).
I want to help you get your mind into a set where you understand two basic differences. One is the class of alarm management performed AFTER an alarm has been created. The other is the actual creation of an alarm message, and/or perhaps making a decision about alarms based on real-time data which is streaming into a database from a running process- better termed alarm handling than alarm management.
First- let's address the post-alarm creation issue. Within many instrumented systems- the DCS, the basic instrument, the HMI, or other digital information system, there resides an alarm processor. This alarm processor is designed to set a bit to true (or false) if a certain condition in the data associated with a specific tag does a certain thing. For instance, the data may exceed a value- either above or below. Or, it may make too large a change across too short a time. Or a handful of other things. This creates an alarm. The digital information system is set to then announce that alarm in a common format, so that it can then be recognized appropriately, handled by an operator, and logged for historical and troubleshooting purposes.
Alarm management systems are not designed to interfere with the creation of this alarm. They are instead designed to deal with the information once the alarm has been created. They database it, putting its information into fields in a database for later sorting, searching, or analysis using statistical tools. (they also do other things, but that was the primary use) With time, these tools have grown more powerful- they now database the information in real time, and make the information immediately available for action by an operator. They also track the alarm configuration database, and associate engineering informaiton to each configured alarm or tag. TiPS does this. (http://www.tipsweb.com/products/logmate/). For this reason, some companies have actually used our system as the real-time annunciator and display for alarms to their operators. This is not the design of our system, but is one use we have seen.
With this "semi-real-time" alarm handling, there are some very basic dynamic alarming functions which can be undertaken to reduce alarms in times of potential flood, or just to reduce control room over-information. In LogMate, using the SIGNAL (http://www.tipsweb.com/products/logmate/signal.asp) package, it is a very simple process to review the alarm information that is coming in, and using logical tools, to write messages back to the DCS which can dynamically change the alarm configuration. This can be useful for many simple cases. The first is when you have a piece of equipment which trips, and causes a series of alarms. Suppression of the other alarms is a simple matter of setting a logical filter to trap them, and suppress them.
The second case may be state specific. By setting up a specific state table, it is a simple matter of defining the logic to trap plant state changes (tractable ones, at least), and using that information to alter the alarm setup. This can be extremely handy for times of start-up, shut-down, or various product states. Not complex, but can be if you want it to be. The nice part about this is the SIGNAL application also allows an easy port to tie in higher-level applications. We do not attempt to offer these types of applications, but perhaps you know of one that works (we can even make suggestions- see www.ureason.com or www.prosys.com ). There is an easy method of making this SIGNAL application either produce a state signal, or react based upon one. LogMate is designed to work in conjunction with and augment such applications, not to replace or negate. But let's talk about that.
Having worked at Gensym for several years, I used to be of the mind that if we just found ways of warning the operator ahead of time, we could totally avoid alarms. Make a system that tells the operator that- based upon the current data stream, something was about to happen, and a change could avoid it. The problems with this belief were many:
1. Live plant data lies. Live models are almost impossible to believe, so you have to have reliable tools for sensor validation, or data reconciliation (the first judges data based on empirical models, the second based upon material and heat balances).
2. Even good data cannot account for every possible occurance that may cause problems. How many people test their systems to get "good" plant problem data? I doubt you want to make a bunch of crap product just so you have the data to fill out your problem matrix, so this is a virtual impossiblity.There are so many permutations of what is the best action to take in a given circumstance, that the test matrix would be very large. Most have decided that a human is better capable of judging that than a model.
3. Models require sometimes superhuman efforts to keep them reliable. The guys who like to design good plant models seldom like to maintain them. It's really hard to keep a PhD engineer interested in maintenance tasks unless the problem is sufficiently difficult enough to continue to challenge him/her.
4. The cost of maintaining these models often exceeds their value. How many people can you devote to model maintenance? Will they want to keep that job? How much are you willing to pay? In extremely high-value applications, this may be a moot point, but most applications where you might use this don't have that value level. Note that start-up and shut-down applicaitons may transcend this statement, as their quantity of use overshadows their one-time value.
5. Communication with real-time data often requires expensive drivers. Their maintenance is similar to number 4 above. Full reliance on such an application can turn an applicaiton into a mission-critical situation if it is to work as it should, as it should essentially replace the old alarming system. I have met very few operations managers who were willing to throw their faith to the new system in lieu of old reliable.
6. If implemented in a non-automatic mode, it was often difficult for the operator to figure out just WHAT he should do given any existing error condition. Which variable offers the most sensitivity in a given situation under a given set of circumstances? Which way do I push it to make the change that I desire? Which way was it heading? How fast? Training becomes of the utmost importance. Warren Thompson used to say that was why we needed operators with engineering degrees in the future.
Imagine placing your faith in an applicaiton which has layers of logic being controlled by complex models which are driven by live plant data. Do you have the truth meter which will tell you when it is not sure about what it is telling you because of the relibility of the model in that region of operation?
Having sold, installed, and otherwise supported several of these systems, I can tell you that few pass the test of time. The main factor was always their value versus cost comparison. I know of some applications of this type which still run after several years just because the value they return is so high. I know even more that went the other way.
Be careful when you start down this path. From an engineering standpoint, it makes logical sense that it will solve the world's problems. It might- but do you think you'll be the first for whom it does? I would encourage you that you should clean up the basic problems first, or attempts to institute such a system will just make things worse. I have a standard saying- "You haven't seen an alarm flood until you've seen a 'smart alarm' flood".
I've seen many a PhD ship wrecked on the shoals of plant state estimators. They are never good enough- always requiring another tweak, another level of algorithmic change, another layer of math or logic that will resolve the last issue discovered, etc. Unless- they are done simply with lowered expectations, and lowed maintanance requirements. If your operators can maintain it, it might work. Just what level of state estimation do you think your operators can reliably maintain? Experience tells me that an operator can handle a system that requires him to input keystrokes that tell a system what is the current state of the plant (startup, shutdown, normal operation, etc.).
Don't get me wrong here. There are many good model-based state estimation tehniques, and I have even seen a great application written by a guy at ExxonMobil, one by a company in Europe, and a couple of others at refineries in the US (both by the same company). Just be certain that you have cleaned up the underlying alarm problems first, or its just like putting paint over rust. Also, be prepared to commit the level of maintanance effort necesary to upgrade, alter, and otherwise support the application once it is installed. That's like being willing to maintain that shiny new paint job- let it get tree sap on it once, and it just doesn't shine anymore. I hope you know a lot about real-time data acquisition, and the intricacies of handling that. Your life will soon be more confused, but you'll probably feel you're confused on a higher level and about more important things.
If you are not already deep into alarm management, and you want a system that works- implement a simple one. Start out working with aready created alarms, not live data. Let the DCS do that job that it does so well. Make simple logic relationships that can be troubleshot and tested by operators. Use the real-time stream of alarm data coming into your alarm management database, not live process data. You'll use a lot less antacid.
Once you have that in hand, you can look deeper, if the expected ROI justifies it. And my last recommendation- choose a company that is experienced in this area. They will not lie to you and make it sound like it's a walk in the park. They'll properly steer you down the path you should go, and meet the expectations that they help you to set. This can save not only antacid, but careers.
SMApple 4-04-06

Comments