Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021 
<== Date ==> <== Thread ==>

Subject: Re: alarm handler issue
From: Andrew Johnson <anj@anl.gov>
To: <core-talk@aps.anl.gov>
Cc: Rolf Keitel <rolf@triumf.ca>
Date: Tue, 31 Jul 2018 15:56:15 -0500
On 07/31/2018 02:23 AM, Ralph Lange wrote:
> Disclaimer: the original report was not going to the list, so all
> background I know is from the snippets in Andrew's mail.

I only removed trivial lines from the original when replying, for
exactly that reason.

> Ad 1.
> I strongly support this for MENU fields. If the CA server rejects a
> value as illegal, it should stick to that promise, i.e. not set the
> field to the rejected value.

AIUI the error message didn't come from the ca_put() operation, Rolf's
error message "dbFastLinkConv(cvt_menu_st)" came from the subsequent
ca_get_callback() which failed to convert the new index value into a
string. The ca_put() succeeded as the number was written into the HSV field.

Looking at the code though I'm puzzled. The dbFastLinkConv.c routine
cvt_st_menu is already checking for and rejecting integers outside the
valid range for the menu without writing them to the field, so I don't
see how it's getting through.

Ahhh, of course caput is sending it as a DBF_ENUM and not as a string,
so the put-conversion routine used is going to be cvt_e_e() which does
no checking at all. Interestingly if I use '-s' caput refuses to send
any value which is not one of the valid choice strings, so I can't use
that to send "2000" as a string.

I can confirm that the IOC's value checks are working by using the
caput.pl script, which does send the value as a string, shows an
exception on the client side:

> tux% bin/linux-x86_64/caput.pl anj:ao.HSV 100
> Old : anj:ao.HSV                     MINOR
> CA.Client.Exception...............................................
>     Warning: "Channel write request failed"
>     Context: "op=1, channel=anj:ao.HSV, type=DBR_STRING, count=1, ctx="anj:ao""
>     Source File: ../oldChannelNotify.cpp line 160
>     Current Time: Tue Jul 31 2018 15:42:00.777857379
> ..................................................................
> New : anj:ao.HSV                     MINOR

and triggers the expected error on the IOC:

> Illegal choice PV: anj:ao.HSV error detected in routine: dbFastLinkConv(cvt_st_menu)

Fixing cvt_e_e() is not trivially possible since it is used for copying
values between all of the enum/menu/device types and in both directions,
and its third paddr argument doesn't always describe the destination field.


> For ENUM ... I think records like mbbi/mbbo can be used without strings
> for choices, and I fear that changes could easily break existing
> applications.

Agreed. ENUM field choices can change dynamically so what was legal a
moment ago might now not be, or vice-versa thus the code has to be able
to handle numeric values outside of the current legal range.

> Ad 2. 
> No real opinion on that.
> 
> Ad 3.
> Good move. Restricting proliferation of illegal and possibly dangerous
> matter is always a good idea.

Thanks, I will commit that change to the 3.14 branch.

- Andrew


> On Mon, Jul 30, 2018 at 9:55 PM Andrew Johnson <anj@aps.anl.gov
> <mailto:anj@aps.anl.gov>> wrote:
> 
>     Hi Rolf,
> 
>     Thanks for the report. I don't think the newer sites are doing global
>     alarm acknowledgement to the IOCs, and the PVA protocol doesn't have
>     support for this at all to my knowledge.
> 
> 
>     On 07/29/2018 03:34 PM, Rolf Keitel wrote:
>     > When debugging my Qt-based alarm handler I noticed the following
>     > behaviour (3.14.12):
>     >
>     > If I caput an invalid value to  a severity field, i.e. caput xxxx.HSV
>     > 2000, I get a CA client exception.
>     >
>     > On the IOC I see he message:  Illegal choice PV: xxxx.HSV error
>     detected
>     > in routine: dbFastLinkConv(cvt_menu_st).
>     >
>     > A dbpr xxxx 3 shows HSV field as <nil>
> 
>     I get the same in 3.16, and the 7.0 branch will be the same too.
> 
>     > Now, when the channel goes into alarm, dpr xxx 3 shows HSV <nil>,
>     > SEVR<nil> , ACKS <nil>, but STAT HIGH. The alarm handler receives the
>     > callback message with an acks value of 2000 (the invalid value).
> 
>     When the channel goes into HIGH alarm the severity gets copied from HSV
>     to SEVR and ACKS, and STAT set to HIGH. I can read the illegal value
>     from the ACKS field using 'caget -n' and it is 2000 as you say.
> 
>     > In this situation alh displays the alarm as INVALID, but cannot
>     > acknowledge the alarm, the only way to get rid of the alarm in alh
>     seems
>     > to be a reboot of the IOC.
> 
>     The SEVR field does get cleared again when the alarm goes away, but as
>     long as the record's ACKT is YES the ACKS value will not be reset until
>     a client asks for it to be. If some other client manages to clear ACKS
>     (or sets ACKT to NO) I assume that a restart of alh would also clear the
>     problem since all memory of the illegal state would then be gone.
> 
>     > In my alarm handler I managed to acknowledge by sending the illegal
>     > value 2000 with the acknowledge message to the IOC.
> 
>     The caput tool provides no way to acknowledge an alarm; at least with
>     caget you can ask for the acknowledgement status:
> 
>     > tux% caget -d DBR_STSACK_STRING anj:ao
>     > anj:ao
>     >     Native data type: DBF_DOUBLE
>     >     Request type:     DBR_STSACK_STRING
>     >     Element count:    1
>     >     Value:            0
>     >     Status:           NO_ALARM
>     >     Severity:         NO_ALARM
>     >     Ack transient?:   YES
>     >     Ack severity:     ??
> 
>     I assume you're calling something like ca_put(DBR_PUT_ACKS, chid, &2000)
>     which I would expect to do the right thing for the acknowledgement
>     inside the IOC.
> 
>     Looking at my CA Perl bindings the put_acks() method checks the severity
>     argument from the user code so it can't be used to acknowledge anything
>     above the expected INVALID value.
> 
>     > I cannot test right now if this IOC behaviour was changed in a later
>     > EPICS release.
> 
>     I'm fairly sure nothing will have changed in newer releases, we haven't
>     touched that area of the code to my knowledge.
> 
> 
>     This is obviously a bug in the IOC's handling of acknowledgements and/or
>     alarm severities, the question in my mind is where we should be checking
>     for bad severities because they should never be getting outside the IOC.
>     There are several possibilities, and we might want to implement more
>     than one:
> 
>     1. When doing a put to an ENUM or MENU field, reject numbers out of the
>     legal range. Sometimes we do want to allow this, but maybe only for ENUM
>     fields? Not sure.
> 
>     2. The recGblSetSevr() macro could be converting any severity above
>     MAJOR_ALARM into INVALID_ALARM, but in the macro the NSEV argument is
>     already used twice and this would take it to 4 times, so we might need
>     to convert it into a proper function.
> 
>     3. The recGblResetAlarms() routine should check NSEV before copying it
>     to the other fields — this change is trivial and appears to be all that
>     is really needed to prevent clients from seeing bad severities:
> 
>     diff --git a/src/db/recGbl.c b/src/db/recGbl.c
>     index 6d45e73..01b2fa5 100644
>     --- a/src/db/recGbl.c
>     +++ b/src/db/recGbl.c
>     @@ -19,6 +19,7 @@
>      #include <limits.h>
> 
>      #include "dbDefs.h"
>     +#include "alarm.h"
>      #include "epicsMath.h"
>      #include "epicsTime.h"
>      #include "epicsPrint.h"
>     @@ -222,6 +223,9 @@ unsigned short epicsShareAPI recGblResetAlarms(void
>     *precord)
>          epicsEnum16 val_mask = 0;
>          epicsEnum16 stat_mask = 0;
> 
>     +    if (new_sevr > INVALID_ALARM)
>     +        new_sevr = INVALID_ALARM;
>     +
>          pdbc->stat = new_stat;
>          pdbc->sevr = new_sevr;
>          pdbc->nsta = 0;
> 
> 
>     This doesn't prevent the HSV field from being set to an illegal value,
>     but any device support or subroutine record could easily do that so #1
>     above couldn't prevent it from happening.
> 
>     I am inclined to commit the above change to the 3.14 branch and merge
>     up, and leave doing anything further for discussion between the core
>     developers in one of the downstream branches.
> 
>     Anybody got any comments?
> 
>     - Andrew
> 
>     -- 
>     Arguing for surveillance because you have nothing to hide is no
>     different than making the claim, "I don't care about freedom of
>     speech because I have nothing to say." -- Edward Snowdon
> 

-- 
Arguing for surveillance because you have nothing to hide is no
different than making the claim, "I don't care about freedom of
speech because I have nothing to say." -- Edward Snowdon

References:
Re: alarm handler issue Andrew Johnson
Re: alarm handler issue Ralph Lange

Navigate by Date:
Prev: Re: Compiler error on Debian Stretch arm Ralph Lange
Next: Build failed in Jenkins: epics-base-3.14-win64 #256 APS Jenkins
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021 
Navigate by Thread:
Prev: Re: alarm handler issue Ralph Lange
Next: Re: Compiler error on Debian Stretch arm Ralph Lange
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021 
ANJ, 31 Jul 2018 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·