EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
<== Date ==> <== Thread ==>

Subject: Re: Understanding alarms, alarm server and notification usage in other facilities
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: Érico Nogueira Rolim <erico.rolim at lnls.br>
Cc: EPICS Tech Talk <tech-talk at aps.anl.gov>
Date: Mon, 11 May 2026 19:23:40 -0700
Hello Érico,

On 4/30/26 6:39 AM, Érico Nogueira Rolim via Tech-talk wrote:
Hi folks!

As part of evaluating a unified solution for alarms and notifications at
Sirius, we are interested in hearing about the community's experiences
with alarm servers and related services.

It has been a decade since I was supporting NSLS2, but some of that experience might still be valid.

The NSLS2 Operations group had strong opinions about what constituted an Alarm.  These were driven both by group culture, and by regulatory requirements.  Every Alarm had to have a reviewed and approved response procedure.  Alarm occurrences were logged and periodically reviewed with an eye to reducing false positives.  And I still remember the amused look I received when I asked if one operator was interested in receiving alarm notifications off-shift (not the wording I used, but clearly what was heard).

So the NSLS2 accelerator side ended up with two alarm server instances.  One for Operations, with the official Alarms.  And another which I jokingly referred to as the "engineer notification" server.  There was some overlap in PVs between the two instances, but not as much as you might expect.  eg. Operations looked a vacuum pressure, Controls looks at communication status.

What we want is a system to receive abstract "notifications", and turn
those into some kind of message, for example by SMS (for areas with bad
internet coverage), e-mail, MS Teams messages, and sound alarms in
control rooms.

The Phoebus alarm server, from our understanding, depends entirely on
alarms implemented in the EPICS layer. As explored in
https://urldefense.us/v3/__https://indico.global/event/14049/contributions/135608/__;!!G_uCfscf7eWS!axho_P0N7Qmsrf6LR1H9vQCbrykegZWwW1MhKVasQTSh6a8XvDnELShqUQeZ-2B4AY6_iOAsErQVNsZB_0Zho-cHIO4$  , there are some
limitations to these kinds of alarms, mainly that their conditions tend
to be static, even though there are situations that change the desired
thresholds and severity for their conditions: different shifts
(maintenance, commissioning, beam for users), different operation modes
(top-up, accumulation, single-bunch), etc.

 There's also the matter that
some limits might not be known when a hard IOC is being developed (by an
"expert") and will only be determined as part of commissioning or
ongoing operation experience, requiring that the expert be involved in
this step as well.

It is common for EPICS drivers to include some alarm configuration fields in the autosave set, and/or on OPI screens, for just this reasons.


Some of the above can be worked around by adding a layer of soft IOCs on
top of the existing IOCs. These IOCs can be made up of records which
include additional logic for enabling alarms and might be simpler to
change and control. This requires developing and deploying an IOC
whenever such logic is necessary, can involve a lot of database
duplication, and requires someone knowledgeable in EPICS databases (or
some of the "soft IOC" alternatives) to implement it.

True, although writing these sorts of IOCs needs a fairly basic knowledge of the process database.


These solutions seem to address daily operation, with reasonably well
understood thresholds and conditions. However, we have observed a
variety of other needs from our operation and support staff: it's
possible that thresholds which are relevant to operation are not the
same as what subsystem experts want to be notified for; support staff
investigating malfunctioning devices might want warnings when the device
is within a certain operating range; someone running a procedure (be it
an experiment or part of commissioning) would like to be notified when
PVs reach some desired value.

These cases don't seem to fit in the "EPICS PV" model of alarm handling,
due to the amount of desired thresholds (more than 2 upper and 2 lower),
or the dynamic nature of what they want to check

True by a narrow reading.  The usual EPICS response of "add another record" enables quite a bit more flexibility, especially if that record is a calc.


 (the people involved
might not be familiar enough with EPICS development to write a soft IOC
on their own, and we don't want people deploying new IOCs with no
oversight). They seem to require some PV monitoring solution of their
own, possibly able to combine information from multiple PVs.

I certainly agree with the idea of providing guidance, and guard rails, for such users.  eg. a softIoc loaded with a number of calc records.


The way we see it, the implementations for turning "notifications" into
messages could be reused for both EPICS PVs and this other solution;
Phoebus Alarm Server's usage of Kakfa to decouple clients from the
server should simplify this, too.

I would (of course) encourage you to look at ways to bridge your other notification sources into the EPICS world.  Either as distinct PVs, which could eg. also be archived, or by teaching the phoebus alarm server about other data sources.


For specific questions:

My answers are a decade old at this point, and specific to the accelerator side of the NSLS2 house.


- Do the circumstances and requirements I described make sense? Are they
similar at your facility?

The NSLS2 Operations crew specifically wanted static alarm definitions, and in the course of accelerator commissioning they learned enough about the EPICS tools to be self-sufficient with alarm aggregation databases.  Controls engineers mostly looked at alarms on a different set of PVs.


- What alarm server solution are you using at your facility? How is it
integrated with your communication tools to send notifications?

The BEAST!  (my favorite Kasemir project name)  Our "engineering notification" server instance could reach out to an SMTP server.


- How dynamic are your alarm conditions?

All those I encountered were simple range or state alarms.


- How do you deal with alarms which can be ignored in certain situations?

The BEAST had a notion of conditionally masking alarms. I believe the phoebus alarm server inherited this.


- If you have implemented some kind of solution for dynamic alarms, what
does it look like?

Thank you very much,

Érico


Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.

Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.



References:
Understanding alarms, alarm server and notification usage in other facilities Érico Nogueira Rolim via Tech-talk

Navigate by Date:
Prev: Re: PVXS and CA Put benchmark Michael Davidsaver via Tech-talk
Next: Re: PVA custom structs Sky Brewer via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
Navigate by Thread:
Prev: RE: Understanding alarms, alarm server and notification usage in other facilities Blomley, Edmund (IBPT) via Tech-talk
Next: EPICS Qt 4.1.6 [SEC=OFFICIAL] STARRITT, Andrew via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
ANJ, 12 May 2026 · Home · News · About · Talk · Base · Modules · Extensions ·
· Distributions · Download · Documents · Links · Licensing ·