EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: EPICS device disconnects and reconnects
From: Mark Rivers <[email protected]>
To: "[email protected]" <[email protected]>, EPICS Tech-Talk <[email protected]>
Date: Sat, 15 Jun 2013 17:29:26 +0000
It turns out that the acquire parameter, ADAcquire was being set to 0 in the constructor for the ADBase base class.  But the busy record still has STAT=UDF, SEVR=INVALID. Further investigation shows that this is the case for all record types if the record has never processed. Here is the proof:

Here is a "bo" record from ADBase.template:

record(bo, "$(P)$(R)ReverseX")
{
   field(PINI, "YES")
   field(DTYP, "asynInt32")
   field(OUT,  "@asyn($(PORT),$(ADDR),$(TIMEOUT))REVERSE_X")
   field(ZNAM, "No")
   field(ONAM, "Yes")
   field(VAL,  "0")
}

When the IOC boots I do dbpr on this record.  Note that STAT=NO_ALARM, SEVR=NO_ALARM

epics> dbpr 13PS1:cam1:ReverseX 10
ACKS: NO_ALARM      ACKT: YES           ASG:                ASP: (nil)
BKPT: 00            COSV: NO_ALARM      DESC:               DISA: 0
DISP: 0             DISS: NO_ALARM      DISV: 1             DOL:CONSTANT
DPVT: 0xa14af00     DSET: 0x893f22c     DTYP: asynInt32     EVNT: 0
FLNK:CONSTANT 0     HIGH: 0             IVOA: Continue normally
IVOV: 0             LALM: 0             LCNT: 0             LSET: 0xa30bde8
MASK: 0             MLIS: 80 8e 54 ed 80 8e 54 ed 01 00 00 00
MLOK: 08 0d e4 09   MLST: 0             NAME: 13PS1:cam1:ReverseX
NSEV: NO_ALARM      NSTA: NO_ALARM      OMSL: supervisory   ONAM: Yes
ORAW: 0             ORBV: 0             OSV: NO_ALARM
OUT:INST_IO @asyn(PS1,0,1)REVERSE_X     PACT: 0             PHAS: 0
PINI: YES           PPN: (nil)          PPNR: (nil)         PRIO: LOW
PROC: 0             PUTF: 0             RBV: 0              RDES: 0x9a75d58
RPRO: 0             RPVT: 0xa14ae58     RSET: 0x893fae0     RVAL: 0
SCAN: Passive       SDIS:CONSTANT       SEVR: NO_ALARM      SIML:CONSTANT
SIMM: NO            SIMS: NO_ALARM      SIOL:CONSTANT       SPVT: (nil)
STAT: NO_ALARM      TIME: 2013-06-15 12:11:07.631053457     TPRO: 0
TSE: 0              TSEL:CONSTANT       UDF: 0              VAL: 0
WDPT: (nil)         ZNAM: No            ZSV: NO_ALARM


Now I change the database to comment out PINI=YES:

record(bo, "$(P)$(R)ReverseX")
{
#   field(PINI, "YES")
   field(DTYP, "asynInt32")
   field(OUT,  "@asyn($(PORT),$(ADDR),$(TIMEOUT))REVERSE_X")
   field(ZNAM, "No")
   field(ONAM, "Yes")
   field(VAL,  "0")
}


Now I boot the IOC again.  Note that STAT=UDF and SEVR=INVALID.

epics> dbpr 13PS1:cam1:ReverseX 10
ACKS: NO_ALARM      ACKT: YES           ASG:                ASP: (nil)
BKPT: 00            COSV: NO_ALARM      DESC:               DISA: 0
DISP: 0             DISS: NO_ALARM      DISV: 1             DOL:CONSTANT
DPVT: 0xa77deb8     DSET: 0x893f22c     DTYP: asynInt32     EVNT: 0
FLNK:CONSTANT 0     HIGH: 0             IVOA: Continue normally
IVOV: 0             LALM: 0             LCNT: 0             LSET: 0xa93ef28
MASK: 0             MLIS: d0 89 64 ed d0 89 64 ed 01 00 00 00
MLOK: b0 3c 47 0a   MLST: 0             NAME: 13PS1:cam1:ReverseX
NSEV: NO_ALARM      NSTA: NO_ALARM      OMSL: supervisory   ONAM: Yes
ORAW: 0             ORBV: 0             OSV: NO_ALARM
OUT:INST_IO @asyn(PS1,0,1)REVERSE_X     PACT: 0             PHAS: 0
PINI: NO            PPN: (nil)          PPNR: (nil)         PRIO: LOW
PROC: 0             PUTF: 0             RBV: 0              RDES: 0xa0a8d68
RPRO: 0             RPVT: 0xa77de10     RSET: 0x893fae0     RVAL: 0
SCAN: Passive       SDIS:CONSTANT       SEVR: INVALID       SIML:CONSTANT
SIMM: NO            SIMS: NO_ALARM      SIOL:CONSTANT       SPVT: (nil)
STAT: UDF           TIME: <undefined>   TPRO: 0             TSE: 0
TSEL:CONSTANT       UDF: 0              VAL: 0              WDPT: (nil)
ZNAM: No            ZSV: NO_ALARM


So the busy record is behaving exactly like the "bo" record in this case.  If the record has never processed STAT=UDF and SEVR=INVALID.  Note that the VAL field has been read back from the device successfully which causes UDF=0, but that does not reset the record alarm fields.

Thus I conclude that the busy record is behaving like all other records in EPICS base, and it should not be patched.

Mark



________________________________________
From: [email protected] [[email protected]]
Sent: Saturday, June 15, 2013 10:52 AM
To: Mark Rivers; EPICS Tech-Talk
Subject: Re: EPICS device disconnects and reconnects

Yes, I agree that the driver should set the initial acquire state ( probably 0).

Thanks,
Kate

Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: Mark Rivers <[email protected]>
Date: Sat, 15 Jun 2013 14:51:03
To: Feng, Kate<[email protected]>; EPICS Tech-Talk<[email protected]>
Subject: RE: EPICS device disconnects and reconnects

> Consequently, I found a bug in the devBusyAsyn.c because the SEVR field was always set to be INVALID_ALARM (and STAT : UDF) when the record was initialized. Was it meant to be that way ?

It is definitely meant to be that way. This is an output record with asyn device support. When the IOC initializes this record has STAT=UDF, SEVR=INVALID. This is because in the device support initialization the attempt to read the initial value from the driver returned asynError, so the record did not set the VAL field to match the device. The record does not have PINI=YES, so the record value was not written to the driver at iocInit. Thus, after iocInit the record value is "undefined" because it has been neither read from nor written to the driver. Thus, the operator should be warned that the record is in INVALID alarm, because it may be in a state that is inconsistent with the hardware.

The first time the busy record is processed it will send its value to the driver, and if the operation is successful the status will change to STAT=NO_ALARM, SEVR=NO_ALARM.

Note that perhaps this driver should have been written to query the initial acquire state when it started up (probably 0), and set the acquire parameter in the driver to match that. Then the initial read from device support would have succeeded, and the Acquire.VAL would be set to 0, and the status would not have been invalid. But since the driver does not currently do this, the behavior of the busy record is definitely correct, and your patch should not be applied.

Mark

________________________________________
From: Feng, Kate [[email protected]]
Sent: Friday, June 14, 2013 4:00 PM
To: Mark Rivers; EPICS Tech-Talk
Subject: RE: EPICS device disconnects and reconnects

Hi Mark,

> Every time asynPrint or asynPrintIO is called it could check the connection status of the device and simply not output anything if the device is not connected. Would this be acceptable?

Yes, thank you! I was thinking exactly the same. Meanwhile, I was working to see if I could at least turn the 'Acquire' button white when the device is disconnected by using
the color mode in MEDM, Consequently, I found a bug in the devBusyAsyn.c because the SEVR field was always set to be INVALID_ALARM (and STAT : UDF) when the record
was initialized. Was it meant to be that way ? Or should it be fixed as the attached patch using "diff -Naur busy-1-4/busyApp/src/devBusyAsyn.c.orig busy-1-4/busyApp/src/devBusyAsyn.c" ?
Have a wonderful weekend.

Sincerely,
Kate Feng

________________________________________
From: [email protected] [[email protected]] on behalf of Mark Rivers [[email protected]]
Sent: Friday, June 14, 2013 2:20 PM
To: EPICS Tech-Talk
Subject: EPICS device disconnects and reconnects

Folks,

I would like to start a discussion of the problems and potential solutions of device disconnects and reconnects in EPICS.


Statement of the Problem

EPICS device control is moving away from VME-bus based devices that are "always available", and more towards distributed hardware using Ethernet, serial and other buses. The challenge with such devices is that they may not be connected when the IOC boots, and they may be disconnected, reconnected, and power-cycled while the IOC is running.

These disconnect/reconnect events can lead to a number of problems.

1) If the device is not connected when the IOC boots then
- Any code in the device support init_record routine that relies on communication with the device will fail

- Initialization that relies on records with PINI=YES will fail

2) When a device disconnects it can lead to excessive error messages from records that are periodically processing, drivers that have polling loops, etc.

3) When a device reconnects after being power-cycled the EPICS output records are likely to disagree with the actual device settings.

All EPICS records have the PINI field that defines what they should do when the entire IOC changes state. This field has choices NO,YES,RUN,RUNNING,PAUSE,PAUSED. However, there is not a field that defines what a record should do when the device it is associated with disconnects or reconnects.


Potential Solutions in the asyn Framework

When Marty Kraimer released the asyn framework 10 years ago this month he designed it to be able to handle such connection problems. asyn port drivers should notify asynManager when their device connects and disconnects. asynManager has methods to provide callbacks to clients when such asynExceptionConnect events occur. Such clients can include the standard asyn device support, other drivers (e.g. motor, areaDetector) connected to an underlying driver (e.g. drvAsynIPPort), etc.

However, in practice we have not done a very good job of taking advantage of these capabilities in device support and other drivers. I've attached a couple of screen shots of the areaDetector Prosilica driver. ProsilicaDisconnected.png shows that the driver does detect when the camera is disconnected, and notifies asynManager of this. This causes the CNCT field in the asynRecord to display the "Disconnected" string in red, so the operator is aware of the problem. However, when the camera is powered back on, the screen in ProsilicaReconnected.png results. Note that many of the output records (Exposure Time, Binning, Region start, etc.) do not agree with the readbacks of the actual values in the camera. The output records retain their previous values, but the camera has now reverted to the power-up defaults. At present the only way to fix the discrepancy is to hit <Enter> in each of the output record widgets, processing the record and sending the EPICS value to the camera.

There are of course several ways that this could be improved:

1) The database designer could implement an "Initialize all" record that would process all of the output records, either when the operator processed that record manually, or perhaps when the record processed automatically when the asyn record CNCT field changed from Disconnected to Connected. This method requires a lot of work by the database developer. If there are several databases involved, as there are with areaDetector (ADBase.template, prosilica.template), providing the necessary record links is challenging.

2) The driver could store all of the values from the output records, and on a reconnect event it could send these values to the device. This requires every driver to implement such logic, which is again a lot of work for the developer.

3) asyn device support could register for connection callbacks on every output record. When it gets a reconnection callback it would look at some field in the record to decide whether the send the record value to the device. If every record had a field like PRCT (Process on Re-Connect) then it could use that field to decide whether to request record processing for that record on the reconnect event. Since we don't have that field (at least not yet), then it could use PINI to make the decision about processing the output record on a reconnect. Having a connection callback in device support that did this would solve problem 3) described above. It would also mostly solve problem 1) above, the improper initialization because the device was not connected when the IOC was started. If the record has PINI=YES it will send the value that would have been sent during the initial record processing in iocInit. If PINI=NO then it should read the value from the device, and if the read is s!
 uccessful set the output record to that value. This will correctly support bumpless reboots, but only if PINI=NO.

Problem 2) above is excessive error messages when a device disconnects. The goal here could be one that Bob Dalesio has stated: there should be a single error message when a devices disconnects, and single status message when it reconnects, and that is all! This seems reasonable. This could potentially be done in asynManager itself. It is notified of all disconnect and reconnect events, and it can thus produce those messages. But all error reporting in asyn is also done with the pasynTrace interface which is also implemented in asynManager. Every time asynPrint or asynPrintIO is called it could check the connection status of the device and simply not output anything if the device is not connected. Would this be acceptable?

I'd be very interested in hearing what others think of the above ideas.

Cheers,
Mark









Replies:
Re: EPICS device disconnects and reconnects Andrew Johnson
References:
EPICS device disconnects and reconnects Mark Rivers
RE: EPICS device disconnects and reconnects Feng, Kate
RE: EPICS device disconnects and reconnects Mark Rivers
Re: EPICS device disconnects and reconnects feng

Navigate by Date:
Prev: Re: EPICS device disconnects and reconnects feng
Next: flink Sebastian Matkovich
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: EPICS device disconnects and reconnects feng
Next: Re: EPICS device disconnects and reconnects Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·