EPICS Re: alarm hook

Experimental Physics and Industrial Control System

2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: alarm hook
From:	Matthias Clausen <[email protected]>
To:	"Dalesio, Leo `Bob`" <[email protected]>
Cc:	Jeff Hill <[email protected]>, Ralph Lange <[email protected]>, Geoff Savage <[email protected]>, Andrew Johnson <[email protected]>, EPICS core-talk <[email protected]>, Fritz Bartlett <[email protected]>, "Liyu, Andrei" <[email protected]>, Bernd Schoeneburg <[email protected]>
Date:	Fri, 23 Jun 2006 18:14:09 +0200

This is an attempt to answer some of the questions raised in previous mails:

Sending all alarms _on change_ - no always all alarms permanently.
To make this reliable we need a queue between the hook and the network task.
If the queue is full we - for now - send the number of lost messages.

To make this even more reliable - as Jeff points out - we could set a'did not send alarm' flag to the record like:

The code could look like this (the end of recGblResetAlarms):
// laupd := last alarm update
    if(sevr!=nsev || stat!=nsta || !laupd) {
                                ^^^^^^^^^^
++:    logAlarm (pdbc, sevr, stat, &success);
++:    /* nsev and nsta are in pdbc->sevr and pdbc->stat */

++:   if ( success) {
++:   pdbc->laupd = TRUE;
++:  } else {
++:  pdbc->laupd = FALSE;
++:  }

        ackt = pdbc->ackt; acks = pdbc->acks;

if(!ackt || nsev>=acks){

            pdbc->acks=nsev;
            db_post_events(pdbc,&pdbc->acks,DBE_VALUE);
        }
    }
    return(mask);
}

The last alarm update - flag could also be stored differently and onemight think of a way to store the previous alarm state and - severity.This would really make sure that all alarms made their way out of the IOC.


We also came to this point in our discussion with Geoff et.al.

Maybe because we have too much respect for the core group we did notdare to ask for too many changes. But since Jeff pointed out thereliability aspect, we might ask for your help to get as close aspossible to 100% message transfer rate.

As far as I understood the discussion with Geoff, they also have a wayto set a trigger, to write all alarm states from all records. This way anewly started alarm server could get an update from all records in oneIOC. This is implemented by crawling through the whole database andsending the current alarm state. Is this also a candidate to become partof base?

A / B These are two (or more) processes waiting for alarm messages. Theprocesses themselves are generic.In our case they will forward the message to the Java Message System -in Fermilab's case it will be their message server. They will beimplemented in different languages though.


- Matthias


Dalesio, Leo `Bob` wrote:

This is an important point that Jeff is making and there is some policy in place - I'm sure.
How do the alarm packages work for DESY and FNAL?
For redundancy we're decided that we do not need to send every change to the backup IOC - only the most recent values of every field that has changed.
For an alarm logger, this seems less optimal. However, if the client cannot keep up, then the queue may occassionally overflow. What are the queue overflow policies for each of these alarm handlers?
Thanks,
Bob
-----Original Message-----
From: Jeff Hill [mailto:[email protected]]Sent: Thursday, June 22, 2006 9:53 AM
To: Dalesio, Leo `Bob`; 'Ralph Lange'
Cc: 'Geoff Savage'; 'Matthias Clausen'; 'Andrew Johnson'; 'EPICS core-talk'; 'Fritz Bartlett'; 'Liyu, Andrei'
Subject: RE: alarm hook


In send-only-on-change subscription update systems one always needs some per subscription state that remembers that there was no queue space and that we were forced to discard a particular update. Later when queue space becomes available we will need to send an update for all subscriptions that are not current. In the CA server's event queue I handle this by replacing the last update on the queue for a particular subscription when the queue saturates.
Based on private discussions with Geoff and Andrei I fear that they are currently implementing an unreliable send-only-on-change subscription update event stream where alarm state change edges can be lost forever (if the alarm queue saturates during an alarm burst).
The original Fermilab alarm system is, I suspect, not a send-only-on-change system, but instead a send-periodically based system and therefore might be more likely to keep clients in sync.

Anyways that is my perception surrounding recent alarm hook developments. I am fearful that the EPICS community could be straying away from reliable event transport, but feel free to realign my discernment of the situation if I am uninformed.

Jeff
-----Original Message-----
From: Dalesio, Leo `Bob` [mailto:[email protected]]
Sent: Thursday, June 22, 2006 5:47 AM
To: Ralph Lange
Cc: Geoff Savage; Matthias Clausen; Andrew Johnson; EPICS core-talk;Fritz Bartlett; Liyu, Andrei
Subject: RE: alarm hook
I'm an optimist about the general reuse of code. It's a habit. Iassume that we could make a minimum set of alarm consumers that wouldbe useful for all of us.
I would also like the dbPostEvent hook for archiving. I think that aplant archiver that only wants data on occasion would prefer to haveit pushed out - rather than connect to 60000 channels. But there needsto be a discussion on that.
-----Original Message-----
From: Ralph Lange [mailto:[email protected]]
Sent: Thursday, June 22, 2006 4:44 AM
To: Dalesio, Leo `Bob`
Cc: Geoff Savage; Matthias Clausen; Andrew Johnson; EPICS core-talk;Fritz Bartlett; Liyu, Andrei
Subject: Re: alarm hook
I think the answers to all your questions highly depend on what A andB are, which is the part not covered by the recent discussions.Depending on the implementation of createMessage() A and B might beOracle servers, parts of the D0 event system, CMLOG servers .... withall these different systems, I doubt there even will be a defaultimplementation for createMessage().The alarm viewers are clients to these A and B servers, so even thespecs might differ depending on the system or installation.
Ralph


Dalesio, Leo `Bob` wrote:
Will A&B have a way to synchronize?
Is there thoughts of being able to serve the alarm information from
multiple sources to many clients?
Any specs on the alarm viewers?
Thanks,
Bob

-----Original Message-----
From: Geoff Savage [mailto:[email protected]]
Sent: Wednesday, June 21, 2006 10:36 PM
To: Matthias Clausen
Cc: Geoff Savage; Andrew Johnson; EPICS core-talk; Fritz Bartlett;Liyu, Andrei
Subject: Re: alarm hook

Hi,
I am willing to accept the hook as proposed by Andrew. This is notthe
exact interface that I am currently using but it does provide theinformation that I require. I will modify the FNAL interface to matchthe proposed interface once the hook is in base.
It is not clear to me how an alarm hook function is registered in a
startup script. Can someone please provide an example that isoperating system independent?
From our (Matthias Clausen, Fritz Bartlett, Vladimir Sirotenko,Geoff
Savage) discussions on Tuesday we developed a "generic" (but notmatching Andrei's interface) interface. I'll address Andrei's emailnext. We propose to include the following "tools" for pushing alarmsfrom the server in a package separate from base. Here is a simplechain showing all the pieces.
hook -> logAlarm -> queue -> sendToNetwork -> createMessage -> sendto A -> send to B (if send to A fails)
Here is a summary of our discussions on each piece with some of my
experiences included (and indicated).
1. We start with the generic hook as proposed by Andrew.

2. The hook function which a user will register is logAlarm(struct
dbCommon *prec, unsigned short sevr, unsigned short stat). Once thisfunction has determined that an alarm needs to be sent it gathers thevolatile data and inserts it into the alarm data queue. It shouldalso do something reasonable if the data can't be inserted in thequeue. It might simply keep track of the number of insertion failuresand report this number in a message once the queue has space.
Geoff - It should send messages on bad and good transitions.  Care
should be taken when a record is successfully processed for the firsttime as it transitions from an undefined (bad) state to a good state.
3. Users should be able to adjust the size of this queue toaccommodate
different numbers of records in the IOCs.
Geoff - There are more alarms during maintenance periods than duringrun
times.
4. The sendToNetwork function is running as a separate thread(vxworks
task) at a priority lower than the scan tasks but higher than channelaccess. It waits for data to arrive in the alarm data queue. Whendata arrives it passes the data to the createMessage routine.
5. The createMessage routine constructs a string message to be sent
across the network. To be more generic users should be able to definetheir own createMessage routine. This allows users to use a differentmessage protocol within the push out framework.
Geoff - Using a string for messages removes byte ordering issues.

6. Send the message to server A.  If sending the message to server A
fails then send the message to server B.
All the requirements are not included here. Hopefully DESY willprovide
more details as the project progresses.
Some other issues to consider -
a. From Jeff Hill - can we detect on a record by record basis whenan
alarm is not pushed out. I think this requires some study of thecommon fields available to all records. This is on my to do list.
b. What will be in the data inserted into the alarm data queue?This
should be all the volatile data needed in the network message todecrease the time spent collecting the data.
c. What are the contents of the network message?
d. A generic server also needs to be provided.

Geoff
P.S.  I need to sleep and will reply to Andrei's email in the morning.



On Jun 19, 2006, at 10:02 PM, Matthias Clausen wrote:
Hi Andrew,

I had a meeting today with Geoff, Fritz and Vladimir at Fermilab.
We discussed the implementation based on the proposed function calland agreed on an implementation which should be as generic aspossible.After Geoff's and Bernd Schoeneburg's vacation (next two weeks) wewill work on an implementation.If Geoff does not see any unforeseen problems I would like to giveyou a 'go' for the change in base.
Thanks for your help!
And - by the way - thanks for your clarification regarding Andrei'smail.
Matthias


Andrew Johnson wrote:
Hi Matthias,

Matthias Clausen wrote:
in preparation for my meeting with Geoff at Fermilab I wand tosend you the proposed hook into base which Bernd Schoeneburg andBob already 'somehow' agreed on:
Here's Berns mail to Bob:
recGblResetAlarms is called in monitor() which is called in theend of record processing just before recGblFwdLink and afterrecGblGetTimeStamp. After calling recGblResetAlarms in monitor()the value changes are checked (not interesting for us).recGblResetAlarms checks for alarm changes and returns the fistapproach of the monitor mask, which is later used for postEvents.postEvents can be called from anywhere like device support, snl,subroutines, 'homebrew' records etc. I think recGblResetAlarmsis called in the monitor function of records only. So I think itis the perfect place. Please check it and correct me if I am wrong.
The code could look like this (the end of recGblResetAlarms):

    if(sevr!=nsev || stat!=nsta) {

++:    logAlarm (pdbc, sevr, stat);
++:    /* nsev and nsta are in pdbc->sevr and pdbc->stat */

        ackt = pdbc->ackt; acks = pdbc->acks;
        if(!ackt || nsev>=acks){
            pdbc->acks=nsev;
            db_post_events(pdbc,&pdbc->acks,DBE_VALUE);
        }
    }
    return(mask);
}
My question:
Do you also agree with approach?
I agree with the location and arguments of the call, which Ibelieve are the same as Fermilab have been using.
And - what would be implemented in base for logAlarm ()?
This could be an empty function which just returns - or it couldbe the 'real' thing where you'll have to check whether alarmlogging should be used at all.
The empty function could be replaced/ overloaded by the 'real'
function if you want to use putAlarm.
I think we just need a global pointer for the routine which willbe called if it's not NULL, so your code just sets it to hook in.
Here's my proposed patch:


Index: recGbl.h
==================================================================
= RCS file:/net/phoebus/epicsmgr/cvsroot/epics/base/src/db/recGbl.h,v
retrieving revision 1.9
diff -u -b -r1.9 recGbl.h
--- recGbl.h    12 Feb 2003 21:22:23 -0000      1.9
+++ recGbl.h    19 Jun 2006 15:25:16 -0000
@@ -30,13 +30,23 @@
      : FALSE\
 )

+/* Structures needed for args */
-/* Global Record Support Routines*/ struct link; struct dbAddr;struct dbr_alDouble; struct dbr_ctrlDouble; struct dbr_grDouble;
+struct dbCommon;
+
+/* Hook Routine */
+
+typedef void (*RECGBL_ALARM_HOOK_ROUTINE)(struct dbCommon *prec,
+ unsigned short sevr, unsigned short stat); extern+RECGBL_ALARM_HOOK_ROUTINE recGblAlarmHook;
+
+/* Global Record Support Routines */
+
 epicsShareFunc void epicsShareAPI recGblDbaddrError(
long status, struct dbAddr *paddr, char *pcaller_name);epicsShareFunc void epicsShareAPI recGblRecordError(
Index: recGbl.c
==================================================================
= RCS file:/net/phoebus/epicsmgr/cvsroot/epics/base/src/db/recGbl.c,v
retrieving revision 1.60.2.2
diff -u -b -r1.60.2.2 recGbl.c
--- recGbl.c    4 Nov 2004 19:21:08 -0000       1.60.2.2
+++ recGbl.c    19 Jun 2006 15:25:16 -0000
@@ -42,6 +42,10 @@
 #include "recGbl.h"


+/* Hook Routines */
+
+RECGBL_ALARM_HOOK_ROUTINE recGblAlarmHook = NULL;
+
 /* local routines */
 static void getMaxRangeValues();

@@ -239,6 +243,7 @@
     if(stat_mask)
         db_post_events(pdbc,&pdbc->stat,stat_mask);
     if(sevr!=nsev || stat!=nsta) {
+       if (recGblAlarmHook) (*recGblAlarmHook)(pdbc, sevr, stat);
        ackt = pdbc->ackt; acks = pdbc->acks;
        if(!ackt || nsev>=acks){
            pdbc->acks=nsev;
If there is general agreement between DESY and FNAL about this,I'll commit the change which will then appear in R3-14-9.
- Andrew
--
-------------------------------------------------------------------
--
-
--
Matthias Clausen                         Cryogenic Controls Group
(MKS-2)
phone:  +49-40-8998-3256                Deutsches Elektronen
Synchrotron
fax:    +49-40-8994-3256
Notkestr. 85
e-mail: [email protected]                           22607
Hamburg
WWW-MKS2.desy.de
Germany
-------------------------------------------------------------------
--
-
--



--
------------------------------------------------------------------------
Matthias Clausen                         Cryogenic Controls Group(MKS-2)
phone:  +49-40-8998-3256                Deutsches Elektronen Synchrotron
fax:    +49-40-8994-3256                                    Notkestr. 85
e-mail: [email protected]                           22607 Hamburg
WWW-MKS2.desy.de                                                 Germany
------------------------------------------------------------------------

Replies:: RE: alarm hook Dalesio, Leo `Bob`; Re: alarm hook Andrew Johnson

References:: RE: alarm hook Dalesio, Leo `Bob`

Navigate by Date:: Prev: RE: alarm hook Dalesio, Leo `Bob`; Next: RE: alarm hook Dalesio, Leo `Bob`; Index: 2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: alarm hook Dalesio, Leo `Bob`; Next: RE: alarm hook Dalesio, Leo `Bob`; Index: 2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025