EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: EPICS device disconnects and reconnects
From: "Pearson, Matthew R." <[email protected]>
To: Kate Feng <[email protected]>
Cc: "[email protected]" <[email protected]>
Date: Wed, 19 Jun 2013 17:38:41 -0400
Hi,


> On 06/18/2013 04:40 PM, Mark Rivers wrote:
>>> Don't we get a WRITE/INVALID alarm back if we process the Acquire record with the port disconnected?
>>> Or, we could do if the driver supports it and returns an error from writeInt32.
>>>
> 'WRITE_ALARM' not necessarily indicates a 'device disconnect' status.  It would be good to
> refine the 'STAT' to distinguish between 'device disconnect' and other types of 'device write
> errors'. The client handlers usually should distinguish between the two different cases.
> For example:
>

At the moment the INVALID severity indicates some kind of data error. So in the case of a write, it would be a failed write. The reason for the fail isn't obvious from the STAT (which is set to WRITE). If we add a disconnected STAT, then the alarm would be DISCONNECTED/INVALID. But this is only one type of error. In principle we could end up with more STATs for different types of error. But it would only end up confusing the operator, who basically only needs to know that 'there is a problem with this data/device', and what's currently in the records and on the display can't be trusted.

Wouldn't it be better to inform the operator via an alarm handling application that something has been disconnected? Or, via separate records that reflect the state of a device (for example, the port connection status can be got from the Asyn record).

Cheers,
Matt

> Currently, medm has a color mode to whiten the text color (i.e. foreground only) when the
> alarm 'SEVR' is 'INVALID".  That works for other types of 'device write errors'.
>
> However, when a GUI client detected that an EPICS channel PV is disconnected from its
> device,  then the GUI client should whiten the text box (i.e. background & foreground) to alert
> the users.
>
>> Yes, we do get WRITE/INVALID if the port is disconnected.  This actually occurs at a higher level than the writeInt32 method in the driver.  This is because pasynManager->queueRequest will fail is the port is disconnected, so the error is detected in device support.  In this case it is devBusyAsyn.c::processBusy, shown below:
>>
>> static long processBusy(busyRecord *pr)
>> {
>>     devBusyPvt *pPvt = (devBusyPvt *)pr->dpvt;
>>     int status;
>>     if(pr->pact == 0) {
>>         pPvt->value = pr->rval;
>>         if(pPvt->canBlock) pr->pact = 1;
>>         status = pasynManager->queueRequest(pPvt->pasynUser, 0, 0);
>>         if((status==asynSuccess) && pPvt->canBlock) return 0;
>>         if(pPvt->canBlock) pr->pact = 0;
>>         if(status != asynSuccess) {
>>             asynPrint(pPvt->pasynUser, ASYN_TRACE_ERROR,
>>                 "%s devAsynBusy::processCommon, error queuing request %s\n",
>>                 pr->name,pPvt->pasynUser->errorMessage);
>>             recGblSetSevr(pr, WRITE_ALARM, INVALID_ALARM);
>>         }
>>     }
>>     return 0;
>> }
>>
> Currently, the output records will not process due to the scan type of 'PASSIVE', unless one has
> implemented the dbPutField() to activate the process.
>
> Cheers,
> Kate
>
>>
>>> Mark, your solution 3) sounds good.
>>> We have the same problem with motor drivers at the moment.
>>> Although, to make the same changes to asyn motor support we'd have to be careful about not moving
>>> motors when someone powers up a controller and Asyn connects to it.
>>>
>> This would be done in the device-independent asyn device support for the motor record, devMotorAsyn.c.  This device support would send the motor record fields it knows should be resent.
>>
>> The motor record is an interesting case, because before every move it does resend the velocity and the acceleration, so those don't actually need to be automatically resent on reconnect.  But depending on the device if the controller was power-cycled it may require a decision by the operator to decide whether the motors need to be rehomed.
>>
>> Mark
>>
>>
>> ________________________________________
>> From:
>> [email protected] [[email protected]] on behalf of Pearson, Matthew R. [[email protected]
>> ]
>> Sent: Monday, June 17, 2013 4:13 PM
>> To: Kate Feng
>> Cc:
>> [email protected]
>>
>> Subject: Re: EPICS device disconnects and reconnects
>>
>> Hi
>>
>> Don't we get a WRITE/INVALID alarm back if we process the Acquire record with the port disconnected? Or, we could do if the driver supports it and returns an error from writeInt32.
>>
>>
>> Mark, your solution 3) sounds good. We have the same problem with motor drivers at the moment. Although, to make the same changes to asyn motor support we'd have to be careful about not moving motors when someone powers up a controller and Asyn connects to it.
>>
>> To override the default PINI setting for Acquire, rather than use dbpf, another way would be to instantiate an extra database that just sets that field, like:
>>
>> record(busy, "${P}${R}Acquire")
>> {
>>    field(PINI, "1")
>> }
>>
>> or, in the ADBase.template, use a macro with a default value so that people can override it if they wish:
>>
>> field(PINI, "$(PINI=0)")
>>
>> Cheers,
>> Matt
>>
>>
>>
>> On Jun 17, 2013, at 4:13 PM, Kate Feng
>> <[email protected]>
>>  wrote:
>>
>>
>>> Hi Jason,
>>>
>>> On 06/17/2013 03:09 PM, Jason Abernathy wrote:
>>>
>>>>> I believe that PINI=NO is the smartest choice for the XXX:Acquire record.
>>>>>
>>>>> We currently have 4 camera "IOCs", with substantial image processing, operating on a single computer. If we forgot to set >PINI=NO, and all cameras commenced acquisition at IOC boot, our hardware would be overwhelmed!
>>>>>
>>> It seems that you and Mark misunderstood what I stated to set 'PINI=YES'.   I do mean to set the 'VAL' to be 0 so
>>> that the 'acquire' record is defined to be 'off' (i.e. stop), instead of being STAT=UDF, SEVR=INVALID when IOC is
>>> first initialized. The software did set  the acquire parameter(i.e. ADAcquire)  to 0 in the constructor for the ADBase
>>> base class.  Thus, the value of record should match with the value set for the acquire parameter.
>>>
>>> Please read what Mark Rivers posted on 6/15/2013 at
>>>
>>> http://www.aps.anl.gov/epics/tech-talk/2013/msg01223.php
>>> http://www.aps.anl.gov/epics/tech-talk/2013/msg01225.php
>>>
>>>
>>> Currently, when the device is disconnected, one could still hit the acquire 'start' button.  Thus, we wish to add
>>> the color mode for the alarm.  However, the STAT=UDF, SEVR=INVALID does not really imply 'disconnect'.
>>> Perhaps, should EPICS add  'DISCONNECT' in the alarm status ?
>>>
>>> For a  Linux IOC, when the device is disconnected, the PVs of the disconnected device is still connected to the GUI
>>> clients if they were run on the same Linux PC as the server.  In fact, the PVs of the disconnected device should be marked
>>> as 'disconnected' as well.
>>>
>>> Any thought ?
>>>
>>> Thanks,
>>> Kate Feng
>>>
>>>
>>>> As for the disconnect / reconnect problem, I agree that it's an issue for the prosilica driver. I solved it by "hacking" the driver to force the reset of certain camera acquisition parameters during reconnection. Per-driver implementation of reconnection routines is neither consistent nor easy to implement. Once the 1.9.1 version of areaDetector was released, the solution was moved to the database and implemented with a fanout record (your example of an "initialize all" record).
>>>>
>>>> While trying to solve this problem, I also looked into generating a user-defined "Reconnect" event which is posted whenever the camera reconnects:
>>>>
>>>> record (event, "xxx:CAMERA:Reconnect") {
>>>>   field (VAL, "Reconnect")
>>>> }
>>>>
>>>> record(mbbo, "xxx:CAMERA:TriggerMode")
>>>> {
>>>>     ...
>>>>     field(SCAN, "Event")
>>>>     field(EVNT, "Reconnect")
>>>> }
>>>>
>>>> This would allow a per-record configuration of reconnection handling. Obviously, this prevents the "TriggerMode" record from being Passively scanned, which is unacceptable.
>>>>
>>>> I can think of another solution - but it requires static, per-device configuration and only works for asyn-based drivers. Change the signature of the asynPortDriver::createParam() methods to include a "reconnect" option. When a driver calls exceptionConnect, asyn is responsible for calling "writexxx" on any parameter from the library which needs to be set during reconnection.
>>>>
>>>> Jason
>>>>
>>>> On 13-06-17 09:31 AM, Mark Rivers wrote:
>>>>
>>>>> I deliberately did not set Acquire to PINI=YES, to avoid having a detector potentially automatically start acquiring images when the IOC reboots, for example if Acquire was in save/restore, and it happened to be acquiring when the IOC shut down.  Having the detector start automatically can have consequences like filling up the disk with some detectors, and I thought it was good to require manual intervention to start a detector.  You can always do the following in your startup script even if PINI=NO.
>>>>>
>>>>> dbpf "XXX:Acquire.PROC","1"
>>>>>
>>>>> But if the community feels that PINI=YES on Acquire is a good idea I would be willing to change my mind.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Kate Feng [
>>>>>
>>>>> mailto:[email protected]
>>>>>
>>>>> ]
>>>>> Sent: Monday, June 17, 2013 11:20 AM
>>>>> To: Mark Rivers
>>>>> Cc: EPICS Tech-Talk
>>>>> Subject: Re: EPICS device disconnects and reconnects
>>>>>
>>>>> Hi Mark,
>>>>>
>>>>>       For the "Potential Solutions in the asyn Framework",  you wrote
>>>>>
>>>>>
>>>>>
>>>>>> 3) ........
>>>>>> If the record has PINI=YES it will send the value that would have been sent during
>>>>>> the initial record processing in iocInit.  If PINI=NO then it should read the value
>>>>>> from the device, and if the read is successful set the output record to that value.
>>>>>>
>>>>>>
>>>>> I agree that the 'PINI' filed is a good solution for it.  For example,
>>>>> the 'acquire'
>>>>> record of the prosilica.template should be set to "PINI=YES".
>>>>>
>>>>> Thanks,
>>>>> Kate
>>>>>
>>>>> On 06/14/2013 02:20 PM, Mark Rivers wrote:
>>>>>
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> I would like to start a discussion of the problems and potential solutions of device disconnects and reconnects in EPICS.
>>>>>>
>>>>>>
>>>>>>                             Statement of the Problem
>>>>>>
>>>>>> EPICS device control is moving away from VME-bus based devices that are "always available", and more towards distributed hardware using Ethernet, serial and other buses.  The challenge with such devices is that they may not be connected when the IOC boots, and they may be disconnected, reconnected, and power-cycled while the IOC is running.
>>>>>>
>>>>>> These disconnect/reconnect events can lead to a number of problems.
>>>>>>
>>>>>> 1) If the device is not connected when the IOC boots then
>>>>>>    - Any code in the device support init_record routine that relies on communication with the device will fail
>>>>>>
>>>>>>    - Initialization that relies on records with PINI=YES will fail
>>>>>>
>>>>>> 2) When a device disconnects it can lead to excessive error messages from records that are periodically processing, drivers that have polling loops, etc.
>>>>>>
>>>>>> 3) When a device reconnects after being power-cycled the EPICS output records are likely to disagree with the actual device settings.
>>>>>>
>>>>>> All EPICS records have the PINI field that defines what they should do when the entire IOC changes state.  This field has choices NO,YES,RUN,RUNNING,PAUSE,PAUSED.  However, there is not a field that defines what a record should do when the device it is associated with disconnects or reconnects.
>>>>>>
>>>>>>
>>>>>>                     Potential Solutions in the asyn Framework
>>>>>>
>>>>>> When Marty Kraimer released the asyn framework 10 years ago this month he designed it to be able to handle such connection problems.  asyn port drivers should notify asynManager when their device connects and disconnects.  asynManager has methods to provide callbacks to clients when such asynExceptionConnect events occur.  Such clients can include the standard asyn device support, other drivers (e.g. motor, areaDetector) connected to an underlying driver (e.g. drvAsynIPPort), etc.
>>>>>>
>>>>>> However, in practice we have not done a very good job of taking advantage of these capabilities in device support and other drivers.  I've attached a couple of screen shots of the areaDetector Prosilica driver.  ProsilicaDisconnected.png shows that the driver does detect when the camera is disconnected, and notifies asynManager of this.  This causes the CNCT field in the asynRecord to display the "Disconnected" string in red, so the operator is aware of the problem.  However, when the camera is powered back on, the screen in ProsilicaReconnected.png results.  Note that many of the output records (Exposure Time, Binning, Region start, etc.) do not agree with the readbacks of the actual values in the camera.  The output records retain their previous values, but the camera has now reverted to the power-up defaults.  At present the only way to fix the discrepancy is to hit <Enter> in each of the output record widgets, processing the record and sending the EPICS value to t!
 he
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   ca!
>>>>>>
>>>>>>
>>>>>  mera.
>>>>>
>>>>>
>>>>>> There are of course several ways that this could be improved:
>>>>>>
>>>>>> 1) The database designer could implement an "Initialize all" record that would process all of the output records, either when the operator processed that record manually, or perhaps when the record processed automatically when the asyn record CNCT field changed from Disconnected to Connected.  This method requires a lot of work by the database developer.  If there are several databases involved, as there are with areaDetector (ADBase.template, prosilica.template), providing the necessary record links is challenging.
>>>>>>
>>>>>> 2) The driver could store all of the values from the output records, and on a reconnect event it could send these values to the device.  This requires every driver to implement such logic, which is again a lot of work for the developer.
>>>>>>
>>>>>> 3) asyn device support could register for connection callbacks on every output record.  When it gets a reconnection callback it would look at some field in the record to decide whether the send the record value to the device.  If every record had a field like PRCT (Process on Re-Connect) then it could use that field to decide whether to request record processing for that record on the reconnect event.  Since we don't have that field (at least not yet), then it could use PINI to make the decision about processing the output record on a reconnect.  Having a connection callback in device support that did this would solve problem 3) described above.  It would also mostly solve problem 1) above, the improper initialization because the device was not connected when the IOC was started.  If the record has PINI=YES it will send the value that would have been sent during the initial record processing in iocInit.  If PINI=NO then it should read the value from the device, and if!
  t!
>>>>>>
>>  he !
>>
>>>>>  read is successful set the output record to that value.  This will correctly support bumpless reboots, but only if PINI=NO.
>>>>>
>>>>>
>>>>>> Problem 2) above is excessive error messages when a device disconnects.  The goal here could be one that Bob Dalesio has stated:  there should be a single error message when a devices disconnects, and single status message when it reconnects, and that is all!  This seems reasonable.  This could potentially be done in asynManager itself.  It is notified of all disconnect and reconnect events, and it can thus produce those messages.  But all error reporting in asyn is also done with the pasynTrace interface which is also implemented in asynManager.  Every time asynPrint or asynPrintIO is called it could check the connection status of the device and simply not output anything if the device is not connected.  Would this be acceptable?
>>>>>>
>>>>>> I'd be very interested in hearing what others think of the above ideas.
>>>>>>
>>>>>> Cheers,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>> >
>



References:
EPICS device disconnects and reconnects Mark Rivers
Re: EPICS device disconnects and reconnects Kate Feng
RE: EPICS device disconnects and reconnects Mark Rivers
Re: EPICS device disconnects and reconnects Jason Abernathy
Re: EPICS device disconnects and reconnects Kate Feng
Re: EPICS device disconnects and reconnects Pearson, Matthew R.
RE: EPICS device disconnects and reconnects Mark Rivers
Re: EPICS device disconnects and reconnects Kate Feng

Navigate by Date:
Prev: RE: EPICS device disconnects and reconnects Mark Rivers
Next: Edm Histogram Carr, Gary
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: EPICS device disconnects and reconnects Mark Rivers
Next: Re: EPICS device disconnects and reconnects J. Lewis Muir
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·