|
Hello Érico,
On 4/30/26 6:39 AM, Érico Nogueira
Rolim via Tech-talk wrote:
Hi folks!
As part of evaluating a unified solution for alarms and notifications at
Sirius, we are interested in hearing about the community's experiences
with alarm servers and related services.
It has been a decade since I was supporting NSLS2, but some of
that experience might still be valid.
The NSLS2 Operations group had strong opinions about what
constituted an Alarm. These were driven both by group culture,
and by regulatory requirements. Every Alarm had to have a
reviewed and approved response procedure. Alarm occurrences were
logged and periodically reviewed with an eye to reducing false
positives. And I still remember the amused look I received when I
asked if one operator was interested in receiving alarm
notifications off-shift (not the wording I used, but clearly what
was heard).
So the NSLS2 accelerator side ended up with two alarm server
instances. One for Operations, with the official Alarms. And
another which I jokingly referred to as the "engineer
notification" server. There was some overlap in PVs between the
two instances, but not as much as you might expect. eg.
Operations looked a vacuum pressure, Controls looks at
communication status.
What we want is a system to receive abstract "notifications", and turn
those into some kind of message, for example by SMS (for areas with bad
internet coverage), e-mail, MS Teams messages, and sound alarms in
control rooms.
The Phoebus alarm server, from our understanding, depends entirely on
alarms implemented in the EPICS layer. As explored in
https://urldefense.us/v3/__https://indico.global/event/14049/contributions/135608/__;!!G_uCfscf7eWS!axho_P0N7Qmsrf6LR1H9vQCbrykegZWwW1MhKVasQTSh6a8XvDnELShqUQeZ-2B4AY6_iOAsErQVNsZB_0Zho-cHIO4$ , there are some
limitations to these kinds of alarms, mainly that their conditions tend
to be static, even though there are situations that change the desired
thresholds and severity for their conditions: different shifts
(maintenance, commissioning, beam for users), different operation modes
(top-up, accumulation, single-bunch), etc.
There's also the matter that
some limits might not be known when a hard IOC is being developed (by an
"expert") and will only be determined as part of commissioning or
ongoing operation experience, requiring that the expert be involved in
this step as well.
It is common for EPICS drivers to include some alarm
configuration fields in the autosave set, and/or on OPI screens,
for just this reasons.
Some of the above can be worked around by adding a layer of soft IOCs on
top of the existing IOCs. These IOCs can be made up of records which
include additional logic for enabling alarms and might be simpler to
change and control. This requires developing and deploying an IOC
whenever such logic is necessary, can involve a lot of database
duplication, and requires someone knowledgeable in EPICS databases (or
some of the "soft IOC" alternatives) to implement it.
True, although writing these sorts of IOCs needs a fairly basic
knowledge of the process database.
These solutions seem to address daily operation, with reasonably well
understood thresholds and conditions. However, we have observed a
variety of other needs from our operation and support staff: it's
possible that thresholds which are relevant to operation are not the
same as what subsystem experts want to be notified for; support staff
investigating malfunctioning devices might want warnings when the device
is within a certain operating range; someone running a procedure (be it
an experiment or part of commissioning) would like to be notified when
PVs reach some desired value.
These cases don't seem to fit in the "EPICS PV" model of alarm handling,
due to the amount of desired thresholds (more than 2 upper and 2 lower),
or the dynamic nature of what they want to check
True by a narrow reading. The usual EPICS response of "add
another record" enables quite a bit more flexibility, especially
if that record is a calc.
(the people involved
might not be familiar enough with EPICS development to write a soft IOC
on their own, and we don't want people deploying new IOCs with no
oversight). They seem to require some PV monitoring solution of their
own, possibly able to combine information from multiple PVs.
I certainly agree with the idea of providing guidance, and guard
rails, for such users. eg. a softIoc loaded with a number of calc
records.
The way we see it, the implementations for turning "notifications" into
messages could be reused for both EPICS PVs and this other solution;
Phoebus Alarm Server's usage of Kakfa to decouple clients from the
server should simplify this, too.
I would (of course) encourage you to look at ways to bridge your
other notification sources into the EPICS world. Either as
distinct PVs, which could eg. also be archived, or by teaching the
phoebus alarm server about other data sources.
For specific questions:
My answers are a decade old at this point, and specific to the
accelerator side of the NSLS2 house.
- Do the circumstances and requirements I described make sense? Are they
similar at your facility?
The NSLS2 Operations crew specifically wanted static alarm
definitions, and in the course of accelerator commissioning they
learned enough about the EPICS tools to be self-sufficient with
alarm aggregation databases. Controls engineers mostly looked at
alarms on a different set of PVs.
- What alarm server solution are you using at your facility? How is it
integrated with your communication tools to send notifications?
The BEAST! (my favorite Kasemir project name) Our "engineering
notification" server instance could reach out to an SMTP server.
- How dynamic are your alarm conditions?
All those I encountered were simple range or state alarms.
- How do you deal with alarms which can be ignored in certain situations?
The BEAST had a notion of conditionally masking alarms. I believe
the phoebus alarm server inherited this.
- If you have implemented some kind of solution for dynamic alarms, what
does it look like?
Thank you very much,
Érico
Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.
|