Digging through tech-talk archives I found this (I only started from
when the person who fixed it for us started, 1999):
https://epics.anl.gov/tech-talk/1999/msg00369.php (I have never used 3.12)
This thread has nothing to do with it:
https://epics.anl.gov/tech-talk/1998/msg00487.php
There was something going on when we got UFTI at UKIRT (about 1998?)
which also got parameters from other VME crates like our mount
computer, secondary mirror, weather, etc. and when one of those was
rebooted it hung up. Nothing to do with alarms, just database records
the values of which to put into fits headers.
Digging through my own email (there are some things I don't delete) I
also didn't find anything - but if this dates back to the days of VMS
emails, which is possible, I won't be able to find it (don't remember
where I put the saveset and currently don't have the capacity to even
think about extracting it).
Maren
On Fri, Mar 28, 2025 at 11:06 AM Maren Purves
<m.purves at eaobservatory.org> wrote:
>
> This may not just affect alarm handling - but it reminds me of
> something we have seen a long time ago (late 90s), when accessing a
> record from another IOC. Whenever that IOC was rebooted it
> stopped/crashed the other one. This was using an early version of 3.13
> (we don't have the source or install trees anymore, but I think it may
> have included beta in the name)
>
> Hope somebody else remembers more of this than me. I may find time
> digging through tech-talk later today or over the weekend.
> Maren Purves
> Head of Instrument and Telescope Software
> East Asian Observatory / JCMT
>
> On Fri, Mar 28, 2025 at 8:32 AM Pedro Gigoux via Tech-talk
> <tech-talk at aps.anl.gov> wrote:
> >
> > Hi Andrew,
> >
> > Thanks for your prompt reply. I attached the schematic and the test database we used to isolate the problem. We have two IOCs: one that provides the record SYM:HEX01:CONTROLON and a second IOC that has three records:
> >
> > test:ai: Reads data from the first IOC (INP : CA_LINK SYM:HEX01:CONTROLON NPP NMS)
> > test:calc: Reads data from test:ai (INPA: DB_LINK test:ai.VAL PP NMS) and increments a counter. INPA is set to PP to trigger reading.
> > test:ao: Receives the value of the counter.
> >
> > If the two IOC are up, we see the following:
> >
> > test:ai.SEVR 2025-03-28 14:43:05.111621 NO_ALARM
> > test:ai.STAT 2025-03-28 14:43:05.111621 NO_ALARM
> > test:calc.SEVR 2025-03-28 14:43:05.111623 NO_ALARM
> > test:calc.STAT 2025-03-28 14:43:05.111623 NO_ALARM
> > test:ao.SEVR 2025-03-28 14:43:05.111624 NO_ALARM
> > test:ao.STAT 2025-03-28 14:43:05.111624 NO_ALARM
> > test:ao.VAL 2025-03-28 14:43:05.111624 3
> > test:ao.VAL 2025-03-28 14:43:06.111855 4
> > test:ao.VAL 2025-03-28 14:43:07.111749 5
> > test:ao.VAL 2025-03-28 14:43:08.111765 6
> > test:ao.VAL 2025-03-28 14:43:09.111681 7
> >
> > If I stop the IOC that has SYM:HEX01:CONTROLON, then the alarm severity changes and the counter stops updating, i.e. test:calc stops processing:
> >
> > test:ao.VAL 2025-03-28 14:43:38.111802 36
> > test:ao.VAL 2025-03-28 14:43:39.111805 37
> > test:ao.VAL 2025-03-28 14:43:40.111659 38
> > test:ao.VAL 2025-03-28 14:43:41.111668 39
> > test:ai.SEVR 2025-03-28 14:43:42.111818 INVALID LINK INVALID
> > test:ai.STAT 2025-03-28 14:43:42.111818 LINK LINK INVALID
> > test:calc.SEVR 2025-03-28 14:43:42.111826 INVALID LINK INVALID
> > test:calc.STAT 2025-03-28 14:43:42.111826 LINK LINK INVALID
> >
> > If I keep the IOC down and set test:ai.SIMM=1 then the alarm is cleared and the counter starts updating:
> >
> > test:ai.SEVR 2025-03-28 14:44:18.111641 NO_ALARM
> > test:ai.STAT 2025-03-28 14:44:18.111641 NO_ALARM
> > test:calc.SEVR 2025-03-28 14:44:18.111657 NO_ALARM
> > test:calc.STAT 2025-03-28 14:44:18.111657 NO_ALARM
> > test:ao.VAL 2025-03-28 14:44:18.111662 40
> > test:ao.VAL 2025-03-28 14:44:19.111728 41
> > test:ao.VAL 2025-03-28 14:44:20.111719 42
> > test:ao.VAL 2025-03-28 14:44:21.111760 43
> > test:ao.VAL 2025-03-28 14:44:22.111726 44
> >
> > In EPICS 3.14, the records keep processing if the IOC goes down.
> >
> > Thank you,
> > Pedro.
> >
> >
> > On Fri, 28 Mar 2025 at 14:19, Johnson, Andrew N. <anj at anl.gov> wrote:
> >>
> >> Hi Pedro,
> >>
> >>
> >>
> >> Can you please post a concrete example of a record configuration that used to work in EPICS 3.14.x or 3.15.x and no longer does in EPICS 7.0.x? If you can simplify that to a small number of soft records in each of 2 IOCs that would help us understand and replicate your specific issue. I don’t immediately recognize it as anything that we’ve explicitly changed, but we might have broken your use-case by mistake. Once we see the specific problem we may be able to suggest alternative configurations.
> >>
> >>
> >>
> >> Thanks,
> >>
> >>
> >>
> >> - Andrew
> >>
> >>
> >>
> >> --
> >>
> >> Complexity comes for free, Simplicity you have to work for.
> >>
> >>
> >>
> >>
> >>
> >> On 3/28/25, 3:51 PM, "Tech-talk" <tech-talk at aps.anl.gov> wrote:
> >>
> >>
> >>
> >> Hello,
> >>
> >>
> >>
> >> I am writing to get your advice on managing system unavailability within EPICS 7. In our current operational model we can switch between different instruments seamlessly if one encounters an issue and becomes unavailable. This strategy was effective in previous EPICS versions. However, after migrating to EPICS 7, records reading data from systems that are no longer available stop processing, even if we don't rely on the data from that particular system to continue observing. The issue arises because broken CA links set the alarm severity (SEVR) to INVALID, the alarm status (STAT) to LINK, and halt the record processing of the downstream records. We want a mechanism to override the alarm severity when an instrument becomes unresponsive, ideally with minimal operator intervention.
> >>
> >>
> >>
> >> We have identified three potential ways of achieving this:
> >>
> >> · Maximize Severity Attribute: The idea was to use this attribute to prevent the alarm propagation, but it seems that it does not provide what we need.
> >>
> >> · SIMM Field: Setting the SIMM field to YES enables the record to continue processing without being affected by the INVALID alarm status. The SVAL field can be used to define a simulation value and the SIMS field specifies the simulation mode alarm severity (NO_ALARM in our case). We have tested this approach and it seems to work well. STAT is set to SIMM and the downstream records process without problems.
> >>
> >> · DISA Field: Making DISA=DISV disables the record. The DISS field defines the record's disable severity (e.g. NO_ALARM). This approach also seems to work. STAT is set to DISABLE and the downstream records process without problems as well.
> >>
> >> The SIMM field seems to be the most promising option. I would greatly appreciate your insights on this, as well as any alternative approaches that you might suggest.
> >>
> >> Thank you,
> >> Pedro
- References:
- alarm handling in EPICS 7 Pedro Gigoux via Tech-talk
- Re: alarm handling in EPICS 7 Johnson, Andrew N. via Tech-talk
- Re: alarm handling in EPICS 7 Pedro Gigoux via Tech-talk
- Re: alarm handling in EPICS 7 Maren Purves via Tech-talk
- Navigate by Date:
- Prev:
Re: Connect to IOC in container from another host in different network, via CA Gateway Knap, Giles (DLSLtd,RAL,LSCI) via Tech-talk
- Next:
RE: Question About ADEuresys XML File Support for the EPICS IOC Mark Rivers via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
<2025>
- Navigate by Thread:
- Prev:
Re: alarm handling in EPICS 7 Maren Purves via Tech-talk
- Next:
Offering used Hytec VME hardware Zimoch Dirk via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
<2025>
|