Experimental Physics and Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021 
<== Date ==> <== Thread ==>

Subject: Re: alarm handler issue
From: Ralph Lange <ralph.lange@gmx.de>
To: EPICS Core Talk <core-talk@aps.anl.gov>
Date: Tue, 31 Jul 2018 09:23:30 +0200
Disclaimer: the original report was not going to the list, so all background I know is from the snippets in Andrew's mail.

Ad 1.
I strongly support this for MENU fields. If the CA server rejects a value as illegal, it should stick to that promise, i.e. not set the field to the rejected value.
For ENUM ... I think records like mbbi/mbbo can be used without strings for choices, and I fear that changes could easily break existing applications.

Ad 2. 
No real opinion on that.

Ad 3.
Good move. Restricting proliferation of illegal and possibly dangerous matter is always a good idea.


On Mon, Jul 30, 2018 at 9:55 PM Andrew Johnson <anj@aps.anl.gov> wrote:
Hi Rolf,

Thanks for the report. I don't think the newer sites are doing global
alarm acknowledgement to the IOCs, and the PVA protocol doesn't have
support for this at all to my knowledge.

On 07/29/2018 03:34 PM, Rolf Keitel wrote:
> When debugging my Qt-based alarm handler I noticed the following
> behaviour (3.14.12):
> If I caput an invalid value to  a severity field, i.e. caput xxxx.HSV
> 2000, I get a CA client exception.
> On the IOC I see he message:  Illegal choice PV: xxxx.HSV error detected
> in routine: dbFastLinkConv(cvt_menu_st).
> A dbpr xxxx 3 shows HSV field as <nil>

I get the same in 3.16, and the 7.0 branch will be the same too.

> Now, when the channel goes into alarm, dpr xxx 3 shows HSV <nil>,
> SEVR<nil> , ACKS <nil>, but STAT HIGH. The alarm handler receives the
> callback message with an acks value of 2000 (the invalid value).

When the channel goes into HIGH alarm the severity gets copied from HSV
to SEVR and ACKS, and STAT set to HIGH. I can read the illegal value
from the ACKS field using 'caget -n' and it is 2000 as you say.

> In this situation alh displays the alarm as INVALID, but cannot
> acknowledge the alarm, the only way to get rid of the alarm in alh seems
> to be a reboot of the IOC.

The SEVR field does get cleared again when the alarm goes away, but as
long as the record's ACKT is YES the ACKS value will not be reset until
a client asks for it to be. If some other client manages to clear ACKS
(or sets ACKT to NO) I assume that a restart of alh would also clear the
problem since all memory of the illegal state would then be gone.

> In my alarm handler I managed to acknowledge by sending the illegal
> value 2000 with the acknowledge message to the IOC.

The caput tool provides no way to acknowledge an alarm; at least with
caget you can ask for the acknowledgement status:

> tux% caget -d DBR_STSACK_STRING anj:ao
> anj:ao
>     Native data type: DBF_DOUBLE
>     Request type:     DBR_STSACK_STRING
>     Element count:    1
>     Value:            0
>     Status:           NO_ALARM
>     Severity:         NO_ALARM
>     Ack transient?:   YES
>     Ack severity:     ??

I assume you're calling something like ca_put(DBR_PUT_ACKS, chid, &2000)
which I would expect to do the right thing for the acknowledgement
inside the IOC.

Looking at my CA Perl bindings the put_acks() method checks the severity
argument from the user code so it can't be used to acknowledge anything
above the expected INVALID value.

> I cannot test right now if this IOC behaviour was changed in a later
> EPICS release.

I'm fairly sure nothing will have changed in newer releases, we haven't
touched that area of the code to my knowledge.

This is obviously a bug in the IOC's handling of acknowledgements and/or
alarm severities, the question in my mind is where we should be checking
for bad severities because they should never be getting outside the IOC.
There are several possibilities, and we might want to implement more
than one:

1. When doing a put to an ENUM or MENU field, reject numbers out of the
legal range. Sometimes we do want to allow this, but maybe only for ENUM
fields? Not sure.

2. The recGblSetSevr() macro could be converting any severity above
MAJOR_ALARM into INVALID_ALARM, but in the macro the NSEV argument is
already used twice and this would take it to 4 times, so we might need
to convert it into a proper function.

3. The recGblResetAlarms() routine should check NSEV before copying it
to the other fields — this change is trivial and appears to be all that
is really needed to prevent clients from seeing bad severities:

diff --git a/src/db/recGbl.c b/src/db/recGbl.c
index 6d45e73..01b2fa5 100644
--- a/src/db/recGbl.c
+++ b/src/db/recGbl.c
@@ -19,6 +19,7 @@
 #include <limits.h>

 #include "dbDefs.h"
+#include "alarm.h"
 #include "epicsMath.h"
 #include "epicsTime.h"
 #include "epicsPrint.h"
@@ -222,6 +223,9 @@ unsigned short epicsShareAPI recGblResetAlarms(void
     epicsEnum16 val_mask = 0;
     epicsEnum16 stat_mask = 0;

+    if (new_sevr > INVALID_ALARM)
+        new_sevr = INVALID_ALARM;
     pdbc->stat = new_stat;
     pdbc->sevr = new_sevr;
     pdbc->nsta = 0;

This doesn't prevent the HSV field from being set to an illegal value,
but any device support or subroutine record could easily do that so #1
above couldn't prevent it from happening.

I am inclined to commit the above change to the 3.14 branch and merge
up, and leave doing anything further for discussion between the core
developers in one of the downstream branches.

Anybody got any comments?

- Andrew

Arguing for surveillance because you have nothing to hide is no
different than making the claim, "I don't care about freedom of
speech because I have nothing to say." -- Edward Snowdon

Re: alarm handler issue Andrew Johnson
Re: alarm handler issue Andrew Johnson

Navigate by Date:
Prev: Re: alarm handler issue Andrew Johnson
Next: Re: Compiler error on Debian Stretch arm Ralph Lange
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021 
Navigate by Thread:
Prev: Re: alarm handler issue Andrew Johnson
Next: Re: alarm handler issue Andrew Johnson
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021