EPICS RE: alarm hook

Experimental Physics and Industrial Control System

2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>
Subject:	RE: alarm hook
From:	"Jeff Hill" <[email protected]>
To:	"'Dalesio, Leo `Bob`'" <[email protected]>, "'Ralph Lange'" <[email protected]>
Cc:	"'Geoff Savage'" <[email protected]>, "'Matthias Clausen'" <[email protected]>, "'Andrew Johnson'" <[email protected]>, "'EPICS core-talk'" <[email protected]>, "'Fritz Bartlett'" <[email protected]>, "'Liyu, Andrei'" <[email protected]>
Date:	Thu, 22 Jun 2006 10:53:18 -0600
In send-only-on-change subscription update systems one always needs some per
subscription state that remembers that there was no queue space and that we
were forced to discard a particular update. Later when queue space becomes
available we will need to send an update for all subscriptions that are not
current. In the CA server's event queue I handle this by replacing the last
update on the queue for a particular subscription when the queue saturates.

Based on private discussions with Geoff and Andrei I fear that they are
currently implementing an unreliable send-only-on-change subscription update
event stream where alarm state change edges can be lost forever (if the
alarm queue saturates during an alarm burst). 

The original Fermilab alarm system is, I suspect, not a send-only-on-change
system, but instead a send-periodically based system and therefore might be
more likely to keep clients in sync.

Anyways that is my perception surrounding recent alarm hook developments. I
am fearful that the EPICS community could be straying away from reliable
event transport, but feel free to realign my discernment of the situation if
I am uninformed.

Jeff

> -----Original Message-----
> From: Dalesio, Leo `Bob` [mailto:[email protected]]
> Sent: Thursday, June 22, 2006 5:47 AM
> To: Ralph Lange
> Cc: Geoff Savage; Matthias Clausen; Andrew Johnson; EPICS core-talk; Fritz
> Bartlett; Liyu, Andrei
> Subject: RE: alarm hook
> 
> I'm an optimist about the general reuse of code. It's a habit. I assume
> that we could make a minimum set of alarm consumers that would be useful
> for all of us.
> 
> I would also like the dbPostEvent hook for archiving. I think that a plant
> archiver that only wants data on occasion would prefer to have it pushed
> out - rather than connect to 60000 channels. But there needs to be a
> discussion on that.
> 
> 
> -----Original Message-----
> From: Ralph Lange [mailto:[email protected]]
> Sent: Thursday, June 22, 2006 4:44 AM
> To: Dalesio, Leo `Bob`
> Cc: Geoff Savage; Matthias Clausen; Andrew Johnson; EPICS core-talk; Fritz
> Bartlett; Liyu, Andrei
> Subject: Re: alarm hook
> 
> I think the answers to all your questions highly depend on what A and B
> are, which is the part not covered by the recent discussions.
> Depending on the implementation of createMessage() A and B might be Oracle
> servers, parts of the D0 event system, CMLOG servers .... with all these
> different systems, I doubt there even will be a default implementation for
> createMessage().
> The alarm viewers are clients to these A and B servers, so even the specs
> might differ depending on the system or installation.
> 
> Ralph
> 
> 
> Dalesio, Leo `Bob` wrote:
> > Will A&B have a way to synchronize?
> > Is there thoughts of being able to serve the alarm information from
> multiple sources to many clients?
> > Any specs on the alarm viewers?
> > Thanks,
> > Bob
> >
> > -----Original Message-----
> > From: Geoff Savage [mailto:[email protected]]
> > Sent: Wednesday, June 21, 2006 10:36 PM
> > To: Matthias Clausen
> > Cc: Geoff Savage; Andrew Johnson; EPICS core-talk; Fritz Bartlett;
> > Liyu, Andrei
> > Subject: Re: alarm hook
> >
> > Hi,
> >
> > I am willing to accept the hook as proposed by Andrew.  This is not the
> exact interface that I am currently using but it does provide the
> information that I require.  I will modify the FNAL interface to match the
> proposed interface once the hook is in base.
> >
> > It is not clear to me how an alarm hook function is registered in a
> startup script.  Can someone please provide an example that is operating
> system independent?
> >
> >  From our (Matthias Clausen, Fritz Bartlett, Vladimir Sirotenko, Geoff
> Savage) discussions on Tuesday we developed a "generic" (but not matching
> Andrei's interface) interface.  I'll address Andrei's email next.  We
> propose to include the following "tools" for pushing alarms from the
> server in a package separate from base.  Here is a simple chain showing
> all the pieces.
> >
> > hook -> logAlarm -> queue -> sendToNetwork -> createMessage -> send to
> > A -> send to B (if send to A fails)
> >
> > Here is a summary of our discussions on each piece with some of my
> experiences included (and indicated).
> >
> > 1. We start with the generic hook as proposed by Andrew.
> >
> > 2. The hook function which a user will register is logAlarm(struct
> dbCommon *prec, unsigned short sevr, unsigned short stat).  Once this
> function has determined that an alarm needs to be sent it gathers the
> volatile data and inserts it into the alarm data queue.  It should also do
> something reasonable if the data can't be inserted in the queue.  It might
> simply keep track of the number of insertion failures and report this
> number in a message once the queue has space.
> > Geoff - It should send messages on bad and good transitions.  Care
> should be taken when a record is successfully processed for the first time
> as it transitions from an undefined (bad) state to a good state.
> >
> > 3. Users should be able to adjust the size of this queue to accommodate
> different numbers of records in the IOCs.
> > Geoff - There are more alarms during maintenance periods than during run
> times.
> >
> > 4. The sendToNetwork function is running as a separate thread (vxworks
> task) at a priority lower than the scan tasks but higher than channel
> access.  It waits for data to arrive in the alarm data queue.  When data
> arrives it passes the data to the createMessage routine.
> >
> > 5. The createMessage routine constructs a string message to be sent
> across the network.  To be more generic users should be able to define
> their own createMessage routine.  This allows users to use a different
> message protocol within the push out framework.
> > Geoff - Using a string for messages removes byte ordering issues.
> >
> > 6. Send the message to server A.  If sending the message to server A
> fails then send the message to server B.
> >
> > All the requirements are not included here.  Hopefully DESY will provide
> more details as the project progresses.
> >
> > Some other issues to consider -
> > a. From Jeff Hill - can we detect on a record by record basis when an
> alarm is not pushed out.  I think this requires some study of the common
> fields available to all records.  This is on my to do list.
> > b. What will be in the data inserted into the alarm data queue?  This
> should be all the volatile data needed in the network message to decrease
> the time spent collecting the data.
> > c. What are the contents of the network message?
> > d. A generic server also needs to be provided.
> >
> > Geoff
> > P.S.  I need to sleep and will reply to Andrei's email in the morning.
> >
> >
> >
> > On Jun 19, 2006, at 10:02 PM, Matthias Clausen wrote:
> >
> >
> >> Hi Andrew,
> >>
> >> I had a meeting today with Geoff, Fritz and Vladimir at Fermilab.
> >> We discussed the implementation based on the proposed function call
> >> and agreed on an implementation which should be as generic as
> >> possible.
> >> After Geoff's and Bernd Schoeneburg's vacation (next two weeks) we
> >> will work on an implementation.
> >> If Geoff does not see any unforeseen problems I would like to give
> >> you a 'go' for the change in base.
> >>
> >> Thanks for your help!
> >> And - by the way - thanks for your clarification regarding Andrei's
> >> mail.
> >>
> >> Matthias
> >>
> >>
> >> Andrew Johnson wrote:
> >>
> >>> Hi Matthias,
> >>>
> >>> Matthias Clausen wrote:
> >>>
> >>>> in preparation for my meeting with Geoff at Fermilab I wand to send
> >>>> you the proposed hook into base which Bernd Schoeneburg and Bob
> >>>> already 'somehow' agreed on:
> >>>>
> >>>> Here's Berns mail to Bob:
> >>>>
> >>>>
> >>>>> recGblResetAlarms is called in monitor() which is called in the
> >>>>> end of record processing just before recGblFwdLink and after
> >>>>> recGblGetTimeStamp. After calling recGblResetAlarms in monitor()
> >>>>> the value changes are checked (not interesting for us).
> >>>>> recGblResetAlarms checks for alarm changes and returns the fist
> >>>>> approach of the monitor mask, which is later used for postEvents.
> >>>>> postEvents can be called from anywhere like device support, snl,
> >>>>> subroutines, 'homebrew' records etc. I think recGblResetAlarms is
> >>>>> called in the monitor function of records only. So I think it is
> >>>>> the perfect place. Please check it and correct me if I am wrong.
> >>>>>
> >>>>> The code could look like this (the end of recGblResetAlarms):
> >>>>>
> >>>>>     if(sevr!=nsev || stat!=nsta) {
> >>>>>
> >>>>> ++:    logAlarm (pdbc, sevr, stat);
> >>>>> ++:    /* nsev and nsta are in pdbc->sevr and pdbc->stat */
> >>>>>
> >>>>>         ackt = pdbc->ackt; acks = pdbc->acks;
> >>>>>         if(!ackt || nsev>=acks){
> >>>>>             pdbc->acks=nsev;
> >>>>>             db_post_events(pdbc,&pdbc->acks,DBE_VALUE);
> >>>>>         }
> >>>>>     }
> >>>>>     return(mask);
> >>>>> }
> >>>>>
> >>>> My question:
> >>>> Do you also agree with approach?
> >>>>
> >>> I agree with the location and arguments of the call, which I believe
> >>> are the same as Fermilab have been using.
> >>>
> >>>
> >>>> And - what would be implemented in base for logAlarm ()?
> >>>> This could be an empty function which just returns - or it could be
> >>>> the 'real' thing where you'll have to check whether alarm logging
> >>>> should be used at all.
> >>>> The empty function could be replaced/ overloaded by the 'real'
> >>>> function if you want to use putAlarm.
> >>>>
> >>> I think we just need a global pointer for the routine which will
> >>> be called if it's not NULL, so your code just sets it to hook in.
> >>> Here's my proposed patch:
> >>>
> >>>
> >>> Index: recGbl.h
> >>> ===================================================================
> >>> RCS file: /net/phoebus/epicsmgr/cvsroot/epics/base/src/db/recGbl.h,v
> >>> retrieving revision 1.9
> >>> diff -u -b -r1.9 recGbl.h
> >>> --- recGbl.h    12 Feb 2003 21:22:23 -0000      1.9
> >>> +++ recGbl.h    19 Jun 2006 15:25:16 -0000
> >>> @@ -30,13 +30,23 @@
> >>>       : FALSE\
> >>>  )
> >>>
> >>> +/* Structures needed for args */
> >>>
> >>> -/* Global Record Support Routines*/  struct link;  struct dbAddr;
> >>> struct dbr_alDouble;  struct dbr_ctrlDouble;  struct dbr_grDouble;
> >>> +struct dbCommon;
> >>> +
> >>> +/* Hook Routine */
> >>> +
> >>> +typedef void (*RECGBL_ALARM_HOOK_ROUTINE)(struct dbCommon *prec,
> >>> +    unsigned short sevr, unsigned short stat); extern
> >>> +RECGBL_ALARM_HOOK_ROUTINE recGblAlarmHook;
> >>> +
> >>> +/* Global Record Support Routines */
> >>> +
> >>>  epicsShareFunc void epicsShareAPI recGblDbaddrError(
> >>>      long status, struct dbAddr *paddr, char *pcaller_name);
> >>> epicsShareFunc void epicsShareAPI recGblRecordError(
> >>> Index: recGbl.c
> >>> ===================================================================
> >>> RCS file: /net/phoebus/epicsmgr/cvsroot/epics/base/src/db/recGbl.c,v
> >>> retrieving revision 1.60.2.2
> >>> diff -u -b -r1.60.2.2 recGbl.c
> >>> --- recGbl.c    4 Nov 2004 19:21:08 -0000       1.60.2.2
> >>> +++ recGbl.c    19 Jun 2006 15:25:16 -0000
> >>> @@ -42,6 +42,10 @@
> >>>  #include "recGbl.h"
> >>>
> >>>
> >>> +/* Hook Routines */
> >>> +
> >>> +RECGBL_ALARM_HOOK_ROUTINE recGblAlarmHook = NULL;
> >>> +
> >>>  /* local routines */
> >>>  static void getMaxRangeValues();
> >>>
> >>> @@ -239,6 +243,7 @@
> >>>      if(stat_mask)
> >>>          db_post_events(pdbc,&pdbc->stat,stat_mask);
> >>>      if(sevr!=nsev || stat!=nsta) {
> >>> +       if (recGblAlarmHook) (*recGblAlarmHook)(pdbc, sevr, stat);
> >>>         ackt = pdbc->ackt; acks = pdbc->acks;
> >>>         if(!ackt || nsev>=acks){
> >>>             pdbc->acks=nsev;
> >>>
> >>>
> >>> If there is general agreement between DESY and FNAL about this, I'll
> >>> commit the change which will then appear in R3-14-9.
> >>>
> >>> - Andrew
> >>>
> >> --
> >> ---------------------------------------------------------------------
> >> -
> >> --
> >> Matthias Clausen                         Cryogenic Controls Group
> >> (MKS-2)
> >> phone:  +49-40-8998-3256                Deutsches Elektronen
> >> Synchrotron
> >> fax:    +49-40-8994-3256
> >> Notkestr. 85
> >> e-mail: [email protected]                           22607
> >> Hamburg
> >> WWW-MKS2.desy.de
> >> Germany
> >> ---------------------------------------------------------------------
> >> -
> >> --
> >>
> >>
> >
> >
Replies:: RE: alarm hook Dalesio, Leo `Bob`
References:: RE: alarm hook Dalesio, Leo `Bob`
Navigate by Date:: Prev: RE: seq debugger Jeff Hill; Next: RE: alarm hook Dalesio, Leo `Bob`; Index: 2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: alarm hook Dalesio, Leo `Bob`; Next: RE: alarm hook Dalesio, Leo `Bob`; Index: 2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025