Hi Dirk and Mark,
The problem is not the mutex. I followed Mark's suggestion and did not see
any mutex stuck -- very often got an empty list.
The problem does not happen very often. It seems the (random) frequence
depends on the time/environment? I am having a hard time to catch a case
before the log file reaches the 2 GB limit.
Honestly, I really admire Dirk's effort to put together such a big piece of work!
It is so useful! And so flexible of cource! I did not know anything about it,
can only be amazed the more I read the documents and code. I don't think
there is anything wrong. It is only the situation we are dealing with: the
computer, the network, digi, then all kinds of hardware with different quality.
I strongly believe the problem is caused by losing communication, due to
network glitches, or digi dropping messages (I have a definitive prove of this),
or the firmware of the actual hardware.
Since the package already has a timer, and one of my systems is not using
(yet) the wait command, I hacked the code slightly, to implement a global
timeout -- if the protocol does not finish within like 10 seconds, the timer will
abort the protocol. This should "catch-all". Let's watch it for a few days!
Any new ideas?
Best regards,
Dehong
________________________________________
From: Dirk Zimoch [[email protected]]
Sent: Wednesday, July 13, 2011 12:47 AM
To: Zhang, Dehong
Cc: [email protected]; [email protected]
Subject: Re: streamdevice + asyn stuck
Hello Dehong,
Zhang, Dehong wrote:
> Hi Dirk and Mark,
>
> Thank you very much for helping!
>
> I tried to change the SCAN to "10 second", it did not help. I also tried the
> epicsThreadShowAll command as Dirk suggested, it shows a few timerQueue
> threads, all "OK"
Some time ago I had the problem that I crashed the timerQueue thread in
a timeout callback and thus never completed record processing. However,
this seems to be a different issue because you don't see crashed threads.
>
>>From what I can see from the source code, I believe that we somehow
> lose callbacks from asyn, that breaks the evalcommand() chain and we
> never get to finishProtocol().
Yes. That seems to be the case. At several places, StreamDevice gives
control to asyn or to EPICS base and relies on callbacks. If the
callbacks don't come, StreamDevice is lost.
I have to check if I did anything wrong there. Basically I give control
to asynDriver using queueRequest() and expect a callback either to
handleRequest() or handleTimeout(). From there I call the individual
event handlers. So maybe either my queueRequest() setup is wrong or my
handlers are buggy or asynDriver never calls back. (all code in
AsynDriverInterface.cc)
Later I give control to EPICS base via callbackRequest() to finish
processing the record. But I don't think that there is something wrong.
(code in StreamEpics.cc)
I will try to add some diagnostics to see what a record is currently
waiting for. But that may take a while.
>
> Another possibility is the mutex. If it is not released properly, the
> Stream::process() method will be stuck waiting.
The mutex is automatically released (via destructor) when the function
that contains the lock returns. Thus the only way to keep the lock is to
get stuck in a function while holding the lock. But because I don't
think I have unbounded loops in any function, that can only happen when
a thread crashes or a function I call does not return. I don't think
that happens.
Please switch on debugging (var streamDebug,1). Better remove or disable
all other StreamDevice records or the output will be overwhelming. What
are the last messages you get from a record before is gets stuck?
Dirk
>
> ________________________________________
> From: Dirk Zimoch [[email protected]]
> Sent: Tuesday, July 12, 2011 12:21 AM
> To: Zhang, Dehong
> Cc: [email protected]; [email protected]
> Subject: Re: streamdevice + asyn stuck
>
> Hello Dehong,
>
> Do you have a hung up timerQueue thread? Try epicsThreadShowAll.
>
> Dirk
>
> Zhang, Dehong wrote:
>> Hi Dirk, Mark and fellow EPICSers,
>>
>> Sorry to bother you with this. I tried to chase the source code and search the tech-talk,
>> but could not find much hints.
>>
>> We use base 3.14.9, streamdevice 2.4 and asyn 4.10, have been experiencing this stuck
>> problem for some time. The symptom is that randomly, a mbbi/ai/bi... (input) record would
>> stop updating and no longer does FLNK, with no error/warning messages printed in the log
>> file. And dbpr *** 3 would show that:
>> LCNT=11
>> SEVR=INVALID
>> STAT=SCAN
>>
>> And manually writing to the PROC field also does nothing.
>>
>> It seems like the record is stuck waiting for a callback from streamdevice (and asyn), and
>> the EPICS framework just ignores any "PROC" requests.
>>
>> In the protocol file we do have the timeouts etc:
>> locktimeout = 5000;
>> terminator = CR;
>> replytimeout = 1000;
>> readtimeout = 1000;
>> extrainput = Ignore;
>>
>> While chasing the problem, I noticed that "asynReport 10 ***" would show:
>> multiDevice:No canBlock:Yes autoConnect:Yes
>> enabled:Yes connected:Yes numberConnects 1
>>
>> For some the numberConnects can be 2.
>>
>> Is this "canBlock:Yes" related to my problem? Should the numberConnects be <=1?
>> We use one port to talk to one device.
>>
>> Rebooting the IOC does fix the problems, but then they would come back, randomly.
>>
>> Please advise. Thank you very much.
>>
>> Best regards,
>> Dehong
>>
>
>
- Replies:
- Re: streamdevice + asyn stuck Eric Norum
- References:
- streamdevice + asyn stuck Zhang, Dehong
- Re: streamdevice + asyn stuck Dirk Zimoch
- RE: streamdevice + asyn stuck Zhang, Dehong
- Re: streamdevice + asyn stuck Dirk Zimoch
- Navigate by Date:
- Prev:
RE: CSS errors: cannot excute Kasemir, Kay
- Next:
Re: streamdevice + asyn stuck Eric Norum
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
<2011>
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: streamdevice + asyn stuck Dirk Zimoch
- Next:
Re: streamdevice + asyn stuck Eric Norum
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
<2011>
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|