EPICS Re: streamdevice + asyn stuck

Experimental Physics and Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: streamdevice + asyn stuck
From:	Dirk Zimoch <[email protected]>
To:	"Zhang, Dehong" <[email protected]>
Cc:	"[email protected]" <[email protected]>
Date:	Wed, 13 Jul 2011 09:47:29 +0200

Hello Dehong,

Zhang, Dehong wrote:

Hi Dirk and Mark,

Thank you very much for helping!

I tried to change the SCAN to "10 second", it did not help.  I also tried the
epicsThreadShowAll command as Dirk suggested, it shows a few timerQueue
threads, all "OK"

Some time ago I had the problem that I crashed the timerQueue thread ina timeout callback and thus never completed record processing. However,this seems to be a different issue because you don't see crashed threads.

From what I can see from the source code, I believe that we somehow

lose callbacks from asyn, that breaks the evalcommand() chain and we
never get to finishProtocol().

Yes. That seems to be the case. At several places, StreamDevice givescontrol to asyn or to EPICS base and relies on callbacks. If thecallbacks don't come, StreamDevice is lost.

I have to check if I did anything wrong there. Basically I give controlto asynDriver using queueRequest() and expect a callback either tohandleRequest() or handleTimeout(). From there I call the individualevent handlers. So maybe either my queueRequest() setup is wrong or myhandlers are buggy or asynDriver never calls back. (all code inAsynDriverInterface.cc)

Later I give control to EPICS base via callbackRequest() to finishprocessing the record. But I don't think that there is something wrong.(code in StreamEpics.cc)

I will try to add some diagnostics to see what a record is currentlywaiting for. But that may take a while.


Another possibility is the mutex.  If it is not released properly, the
Stream::process() method will be stuck waiting.

The mutex is automatically released (via destructor) when the functionthat contains the lock returns. Thus the only way to keep the lock is toget stuck in a function while holding the lock. But because I don'tthink I have unbounded loops in any function, that can only happen whena thread crashes or a function I call does not return. I don't thinkthat happens.

Please switch on debugging (var streamDebug,1). Better remove or disableall other StreamDevice records or the output will be overwhelming. Whatare the last messages you get from a record before is gets stuck?



Dirk


________________________________________
From: Dirk Zimoch [[email protected]]
Sent: Tuesday, July 12, 2011 12:21 AM
To: Zhang, Dehong
Cc: [email protected]; [email protected]
Subject: Re: streamdevice + asyn stuck

Hello Dehong,

Do you have a hung up timerQueue thread? Try epicsThreadShowAll.

Dirk

Zhang, Dehong wrote:

Hi Dirk, Mark and fellow EPICSers,

Sorry to bother you with this.  I tried to chase the source code and search the tech-talk,
but could not find much hints.

We use base 3.14.9, streamdevice 2.4 and asyn 4.10, have been experiencing this stuck
problem for some time.  The symptom is that randomly, a mbbi/ai/bi... (input) record would
stop updating and no longer does FLNK, with no error/warning messages printed in the log
file.  And dbpr *** 3 would show that:
LCNT=11
SEVR=INVALID
STAT=SCAN

And manually writing to the PROC field also does nothing.

It seems like the record is stuck waiting for a callback from streamdevice (and asyn), and
the EPICS framework just ignores any "PROC" requests.

In the protocol file we do have the timeouts etc:
locktimeout  = 5000;
terminator   = CR;
replytimeout = 1000;
readtimeout  = 1000;
extrainput   = Ignore;

While chasing the problem, I noticed that "asynReport 10 ***" would show:
multiDevice:No canBlock:Yes autoConnect:Yes
    enabled:Yes connected:Yes numberConnects 1

For some the numberConnects can be 2.

Is this "canBlock:Yes" related to my problem?  Should the numberConnects be <=1?
We use one port to talk to one device.

Rebooting the IOC does fix the problems, but then they would come back, randomly.

Please advise.  Thank you very much.

Best regards,
Dehong

Replies:: RE: streamdevice + asyn stuck Zhang, Dehong

References:: streamdevice + asyn stuck Zhang, Dehong; Re: streamdevice + asyn stuck Dirk Zimoch; RE: streamdevice + asyn stuck Zhang, Dehong

Navigate by Date:: Prev: CSS errors: cannot excute Ma Xiaoyuan; Next: RE: CSS errors: cannot excute Chen, Xihui; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: streamdevice + asyn stuck Mark Rivers; Next: RE: streamdevice + asyn stuck Zhang, Dehong; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025