EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  <2025 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  <2025
<== Date ==> <== Thread ==>

Subject: Re: alarm handling in EPICS 7
From: Maren Purves via Tech-talk <tech-talk at aps.anl.gov>
To: Pedro Gigoux <pedro.gigoux at noirlab.edu>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Fri, 28 Mar 2025 23:41:19 -1000
Digging through tech-talk archives I found this (I only started from
when the person who fixed it for us started, 1999):

https://epics.anl.gov/tech-talk/1999/msg00369.php (I have never used 3.12)

This thread has nothing to do with it:
https://epics.anl.gov/tech-talk/1998/msg00487.php

There was something going on when we got UFTI at UKIRT (about 1998?)
which also got parameters from other VME crates like our mount
computer, secondary mirror, weather, etc. and when one of those was
rebooted it hung up. Nothing to do with alarms, just database records
the values of which to put  into fits headers.

Digging through my own email  (there are some things I don't delete) I
also didn't find anything - but if this dates back  to the days of VMS
emails, which is possible, I won't be able to  find it (don't remember
where  I put the saveset and currently don't have the capacity to even
think about extracting it).

Maren

On Fri, Mar 28, 2025 at 11:06 AM Maren Purves
<m.purves at eaobservatory.org> wrote:
>
> This may not just affect alarm handling - but it reminds me  of
> something we have seen a  long time ago (late 90s), when accessing a
> record from another IOC. Whenever that IOC was rebooted it
> stopped/crashed the other one. This was using an early version of 3.13
> (we don't have the source or install trees anymore, but I think it may
> have included beta in the name)
>
> Hope somebody else remembers more of this than me. I may find  time
> digging through tech-talk later today or over the weekend.
> Maren Purves
> Head of Instrument and Telescope Software
> East Asian Observatory / JCMT
>
> On Fri, Mar 28, 2025 at 8:32 AM Pedro Gigoux via Tech-talk
> <tech-talk at aps.anl.gov> wrote:
> >
> > Hi Andrew,
> >
> > Thanks for your prompt reply. I attached the schematic and the test database we used to isolate the problem. We have two IOCs: one that provides the record SYM:HEX01:CONTROLON and a second IOC that has three records:
> >
> > test:ai: Reads data from the first IOC (INP : CA_LINK SYM:HEX01:CONTROLON NPP NMS)
> > test:calc: Reads data from test:ai (INPA: DB_LINK test:ai.VAL PP NMS) and increments a counter. INPA is set to PP to trigger reading.
> > test:ao: Receives the value of the counter.
> >
> > If the two IOC are up, we see the following:
> >
> > test:ai.SEVR                   2025-03-28 14:43:05.111621 NO_ALARM
> > test:ai.STAT                   2025-03-28 14:43:05.111621 NO_ALARM
> > test:calc.SEVR                 2025-03-28 14:43:05.111623 NO_ALARM
> > test:calc.STAT                 2025-03-28 14:43:05.111623 NO_ALARM
> > test:ao.SEVR                   2025-03-28 14:43:05.111624 NO_ALARM
> > test:ao.STAT                   2025-03-28 14:43:05.111624 NO_ALARM
> > test:ao.VAL                    2025-03-28 14:43:05.111624 3
> > test:ao.VAL                    2025-03-28 14:43:06.111855 4
> > test:ao.VAL                    2025-03-28 14:43:07.111749 5
> > test:ao.VAL                    2025-03-28 14:43:08.111765 6
> > test:ao.VAL                    2025-03-28 14:43:09.111681 7
> >
> > If I stop the IOC that has SYM:HEX01:CONTROLON, then the alarm severity changes and the counter stops updating, i.e. test:calc stops processing:
> >
> > test:ao.VAL                    2025-03-28 14:43:38.111802 36
> > test:ao.VAL                    2025-03-28 14:43:39.111805 37
> > test:ao.VAL                    2025-03-28 14:43:40.111659 38
> > test:ao.VAL                    2025-03-28 14:43:41.111668 39
> > test:ai.SEVR                   2025-03-28 14:43:42.111818 INVALID LINK INVALID
> > test:ai.STAT                   2025-03-28 14:43:42.111818 LINK LINK INVALID
> > test:calc.SEVR                 2025-03-28 14:43:42.111826 INVALID LINK INVALID
> > test:calc.STAT                 2025-03-28 14:43:42.111826 LINK LINK INVALID
> >
> > If I keep the IOC down and set test:ai.SIMM=1 then the alarm is cleared and the counter starts updating:
> >
> > test:ai.SEVR                   2025-03-28 14:44:18.111641 NO_ALARM
> > test:ai.STAT                   2025-03-28 14:44:18.111641 NO_ALARM
> > test:calc.SEVR                 2025-03-28 14:44:18.111657 NO_ALARM
> > test:calc.STAT                 2025-03-28 14:44:18.111657 NO_ALARM
> > test:ao.VAL                    2025-03-28 14:44:18.111662 40
> > test:ao.VAL                    2025-03-28 14:44:19.111728 41
> > test:ao.VAL                    2025-03-28 14:44:20.111719 42
> > test:ao.VAL                    2025-03-28 14:44:21.111760 43
> > test:ao.VAL                    2025-03-28 14:44:22.111726 44
> >
> > In EPICS 3.14, the records keep processing if the IOC goes down.
> >
> > Thank you,
> > Pedro.
> >
> >
> > On Fri, 28 Mar 2025 at 14:19, Johnson, Andrew N. <anj at anl.gov> wrote:
> >>
> >> Hi Pedro,
> >>
> >>
> >>
> >> Can you please post a concrete example of a record configuration that used to work in EPICS 3.14.x or 3.15.x and no longer does in EPICS 7.0.x? If you can simplify that to a small number of soft records in each of 2 IOCs that would help us understand and replicate your specific issue. I don’t immediately recognize it as anything that we’ve explicitly changed, but we might have broken your use-case by mistake. Once we see the specific problem we may be able to suggest alternative configurations.
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >>
> >> - Andrew
> >>
> >>
> >>
> >> --
> >>
> >> Complexity comes for free, Simplicity you have to work for.
> >>
> >>
> >>
> >>
> >>
> >> On 3/28/25, 3:51 PM, "Tech-talk" <tech-talk at aps.anl.gov> wrote:
> >>
> >>
> >>
> >> Hello,
> >>
> >>
> >>
> >> I am writing to get your advice on managing system unavailability within EPICS 7. In our current operational model we can switch between different instruments seamlessly if one encounters an issue and becomes unavailable. This strategy was effective in previous EPICS versions. However, after migrating to EPICS 7, records reading data from systems that are no longer available stop processing, even if we don't rely on the data from that particular system to continue observing. The issue arises because broken CA links set the alarm severity (SEVR) to INVALID, the alarm status (STAT) to LINK, and halt the record processing of the downstream records. We want a mechanism to override the alarm severity when an instrument becomes unresponsive, ideally with minimal operator intervention.
> >>
> >>
> >>
> >> We have identified three potential ways of achieving this:
> >>
> >> ·         Maximize Severity Attribute: The idea was to use this attribute to prevent the alarm propagation, but it seems that it does not provide what we need.
> >>
> >> ·         SIMM Field: Setting the SIMM field to YES enables the record to continue processing without being affected by the INVALID alarm status. The SVAL field can be used to define a simulation value and the SIMS field specifies the simulation mode alarm severity (NO_ALARM in our case). We have tested this approach and it seems to work well. STAT is set to SIMM and the downstream records process without problems.
> >>
> >> ·         DISA Field: Making DISA=DISV disables the record. The DISS field defines the record's disable severity (e.g. NO_ALARM). This approach also seems to work. STAT is set to DISABLE and the downstream records process without problems as well.
> >>
> >> The SIMM field seems to be the most promising option. I would greatly appreciate your insights on this, as well as any alternative approaches that you might suggest.
> >>
> >> Thank you,
> >> Pedro

References:
alarm handling in EPICS 7 Pedro Gigoux via Tech-talk
Re: alarm handling in EPICS 7 Johnson, Andrew N. via Tech-talk
Re: alarm handling in EPICS 7 Pedro Gigoux via Tech-talk
Re: alarm handling in EPICS 7 Maren Purves via Tech-talk

Navigate by Date:
Prev: Re: Connect to IOC in container from another host in different network, via CA Gateway Knap, Giles (DLSLtd,RAL,LSCI) via Tech-talk
Next: RE: Question About ADEuresys XML File Support for the EPICS IOC Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  <2025
Navigate by Thread:
Prev: Re: alarm handling in EPICS 7 Maren Purves via Tech-talk
Next: Offering used Hytec VME hardware Zimoch Dirk via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  <2025
ANJ, 31 Mar 2025 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·