1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | CA task suspend |
From: | Bruce Hill <[email protected]> |
To: | Techtalk <[email protected]> |
Date: | Tue, 15 Oct 2013 15:33:14 -0700 |
I'm trying to debug a task suspend in our EPICS gateway and I'm seeing the same pattern in nearly every case. It doesn't happen immediately, but seems to possibly be related to some of our IOC's being rebooted. First we get gdd reference count underflow errors in the log, and a "No conversion between src & dest types" error from casStrmClient.cc, followed soon by a failed assert in smartGDDPointer.h that calls epicsThreadSuspend(). Looking at casStrmClient.cc, I found what looks like a missing unreference call on a gdd pointer, but why would that cause a reference count underflow? Has anyone else seen this problem or have any suggestions? Thanks! - Bruce See below for diff, log file, and stack trace: Here's where I think we're missing an unreference call in the 3.14.12.3 file: --- src/cas/generic/casStrmClient.cc (revision 13948) +++ src/cas/generic/casStrmClient.cc (working copy) @@ -926,6 +926,8 @@ int cacStatus = caNetConvert ( msg.m_dataType, pPayload, pPayload, true, msg.m_count ); if ( cacStatus != ECA_NORMAL ) { + // bh - looks like we need an unreference here + pDBRDD->unreference (); return this->sendErrWithEpicsStatus ( guard, & msg, chan.getCID(), S_cas_internal, cacStatus ); } The gateway log file looks like this: #### LOG FILE ##################################### 13_20:41:18:CAC: Undecipherable TCP message ( bad response type 8257 ) from ioc-las-srv06.pcdsn:5064 13_20:41:18:Oct 13 20:41:18 Warning: Virtual circuit disconnect ioc-las-srv06.pcdsn:5064 Oct 13 20:41:18 !!! Errlog message received (message is above) 13_22:07:10:gdd reference count underflow!! 13_22:07:10:filename="../../../../src/cas/generic/casStrmClient.cc" line number=895 No conversion between src & dest types no conversion between event app type=0 and DBR type=20 Element count=1 13_22:07:10:gdd reference count underflow!! Oct 13 22:07:10 !!! Errlog message received (message is above) A call to 'assert(! status)' by thread '_main_' failed in ../../../../include/smartGDDPointer.h line 106. Oct 13 22:07:10 !!! Errlog message received (message is above) EPICS Release EPICS R3.14.12.3-0.1.0 $Date: Mon 2012-12-17 14:11:47 -0600$. Oct 13 22:07:10 !!! Errlog message received (message is above) Local time is 2013-10-13 22:07:10.662222000 PDT 13_22:07:10: Oct 13 22:07:10 !!! Errlog message received (message is above) Please E-mail this message to the author or to [email protected] #### STACK TRACE ################################################ (gdb) bt #0 0x0000003c7680aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b2756c40b31 in condWait (pevent=0x13b0d230) at ../../../src/libCom/osi/os/posix/osdEvent.c:75 #2 epicsEventWait (pevent=0x13b0d230) at ../../../src/libCom/osi/os/posix/osdEvent.c:137 #3 0x00002b2756c3f66b in epicsThreadSuspendSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:568 #4 0x00002b2756c3d7aa in epicsAssert (pFile=<value optimized out>, line=106, pExp=0x2b27567ea4ef "! status", pAuthorName=0x2b2756c4e517 "the author") at ../../../src/libCom/osi/os/default/osdAssert.c:51 #5 0x00002b27567dfa4c in ~smartGDDPointerTemplate (this=0x14111270, __in_chrg=<value optimized out>) at ../../../../include/smartGDDPointer.h:106 #6 casMonEvent::~casMonEvent (this=0x14111270, __in_chrg=<value optimized out>) at ../../../../src/cas/generic/casMonEvent.cc:50 #7 0x00002b27567ddc5c in casEventSys::casMonEventDestroy (this=0x13c4c628, ev=..., guard=...) at ../../../../src/cas/generic/casEventSys.cc:379 #8 0x00002b27567df16f in casMonEventDestroy (this=0x2aaab070d9f8, client=..., ev=..., value=<value optimized out>, clientGuard=..., evGuard=...) at ../../../../src/cas/generic/casCoreClient.h:247 #9 casMonitor::executeEvent (this=0x2aaab070d9f8, client=..., ev=..., value=<value optimized out>, clientGuard=..., evGuard=...) at ../../../../src/cas/gen #10 0x00002b27567ddd69 in casEventSys::process (this=0x13c4c628, casClientGuard=...) at ../../../../src/cas/generic/casEventSys.cc:135 #11 0x00002b27567e5179 in eventSysProcess (this=0x13c4c848) at ../../../../src/cas/generic/casCoreClient.h:178 #12 casStreamEvWakeup::expire (this=0x13c4c848) at ../../../../src/cas/generic/st/casStreamOS.cc:151 #13 0x00002b2756c466b2 in timerQueue::process (this=0x13b37538, currentTime=...) at ../../../src/libCom/timer/timerQueue.cpp:140 #14 0x00002b2756c2ca6b in fdManager::process (this=0x636640, delay=0.01) at ../../../src/libCom/fdmgr/fdManager.cpp:105 #15 0x000000000041ac86 in gateServer::mainLoop (this=0x13b38c30) at ../gateServer.cc:276 #16 0x000000000040751a in startEverything (prefix=0x7fff1cbbc437 "NET:CAG:LAS") at ../gateway.cc:657 #17 0x0000000000409ab0 in main (argc=11, argv=0x7fff1cbbb118) at ../gateway.cc:1300 |