Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
<== Date ==> <== Thread ==>

Subject: CA task suspend
From: Bruce Hill <bhill@slac.stanford.edu>
To: Techtalk <tech-talk@aps.anl.gov>
Date: Tue, 15 Oct 2013 15:33:14 -0700
I'm trying to debug a task suspend in our EPICS gateway and
I'm seeing the same pattern in nearly every case.    It doesn't happen
immediately, but seems to possibly be related to some of our IOC's
being rebooted.

First we get gdd reference count underflow errors in the log,
and a "No conversion between src & dest types" error from casStrmClient.cc,
followed soon by a failed assert in smartGDDPointer.h that calls
epicsThreadSuspend().

Looking at casStrmClient.cc, I found what looks like a missing unreference
call on a gdd pointer, but why would that cause a reference count underflow?
Has anyone else seen this problem or have any suggestions?

Thanks!
- Bruce

See below for diff, log file, and stack trace:

Here's where I think we're missing an unreference call in the 3.14.12.3 file:
--- src/cas/generic/casStrmClient.cc    (revision 13948)
+++ src/cas/generic/casStrmClient.cc    (working copy)
@@ -926,6 +926,8 @@
     int cacStatus = caNetConvert (
         msg.m_dataType, pPayload, pPayload, true, msg.m_count );
     if ( cacStatus != ECA_NORMAL ) {
+        // bh - looks like we need an unreference here
+        pDBRDD->unreference ();
         return this->sendErrWithEpicsStatus (
             guard, & msg, chan.getCID(), S_cas_internal, cacStatus );
     }


The gateway log file looks like this:
#### LOG FILE #####################################
13_20:41:18:CAC: Undecipherable TCP message ( bad response type 8257 ) from ioc-las-srv06.pcdsn:5064
13_20:41:18:Oct 13 20:41:18 Warning: Virtual circuit disconnect ioc-las-srv06.pcdsn:5064

Oct 13 20:41:18 !!! Errlog message received (message is above)
13_22:07:10:gdd reference count underflow!!
13_22:07:10:filename="../../../../src/cas/generic/casStrmClient.cc" line number=895
No conversion between src & dest types no conversion between event app type=0 and DBR type=20 Element count=1
13_22:07:10:gdd reference count underflow!!

Oct 13 22:07:10 !!! Errlog message received (message is above)

A call to 'assert(! status)'
    by thread '_main_' failed in ../../../../include/smartGDDPointer.h line 106.

Oct 13 22:07:10 !!! Errlog message received (message is above)
EPICS Release EPICS R3.14.12.3-0.1.0 $Date: Mon 2012-12-17 14:11:47 -0600$.

Oct 13 22:07:10 !!! Errlog message received (message is above)
Local time is 2013-10-13 22:07:10.662222000 PDT
13_22:07:10:
Oct 13 22:07:10 !!! Errlog message received (message is above)
Please E-mail this message to the author or to tech-talk@aps.anl.gov

#### STACK TRACE  ################################################
(gdb) bt
#0  0x0000003c7680aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b2756c40b31 in condWait (pevent=0x13b0d230) at ../../../src/libCom/osi/os/posix/osdEvent.c:75
#2  epicsEventWait (pevent=0x13b0d230) at ../../../src/libCom/osi/os/posix/osdEvent.c:137
#3  0x00002b2756c3f66b in epicsThreadSuspendSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:568
#4  0x00002b2756c3d7aa in epicsAssert (pFile=<value optimized out>, line=106, pExp=0x2b27567ea4ef "! status", pAuthorName=0x2b2756c4e517 "the author")
    at ../../../src/libCom/osi/os/default/osdAssert.c:51
#5  0x00002b27567dfa4c in ~smartGDDPointerTemplate (this=0x14111270, __in_chrg=<value optimized out>) at ../../../../include/smartGDDPointer.h:106
#6  casMonEvent::~casMonEvent (this=0x14111270, __in_chrg=<value optimized out>) at ../../../../src/cas/generic/casMonEvent.cc:50
#7  0x00002b27567ddc5c in casEventSys::casMonEventDestroy (this=0x13c4c628, ev=..., guard=...) at ../../../../src/cas/generic/casEventSys.cc:379
#8  0x00002b27567df16f in casMonEventDestroy (this=0x2aaab070d9f8, client=..., ev=..., value=<value optimized out>, clientGuard=..., evGuard=...)
    at ../../../../src/cas/generic/casCoreClient.h:247
#9  casMonitor::executeEvent (this=0x2aaab070d9f8, client=..., ev=..., value=<value optimized out>, clientGuard=..., evGuard=...) at ../../../../src/cas/gen
#10 0x00002b27567ddd69 in casEventSys::process (this=0x13c4c628, casClientGuard=...) at ../../../../src/cas/generic/casEventSys.cc:135
#11 0x00002b27567e5179 in eventSysProcess (this=0x13c4c848) at ../../../../src/cas/generic/casCoreClient.h:178
#12 casStreamEvWakeup::expire (this=0x13c4c848) at ../../../../src/cas/generic/st/casStreamOS.cc:151
#13 0x00002b2756c466b2 in timerQueue::process (this=0x13b37538, currentTime=...) at ../../../src/libCom/timer/timerQueue.cpp:140
#14 0x00002b2756c2ca6b in fdManager::process (this=0x636640, delay=0.01) at ../../../src/libCom/fdmgr/fdManager.cpp:105
#15 0x000000000041ac86 in gateServer::mainLoop (this=0x13b38c30) at ../gateServer.cc:276
#16 0x000000000040751a in startEverything (prefix=0x7fff1cbbc437 "NET:CAG:LAS") at ../gateway.cc:657
#17 0x0000000000409ab0 in main (argc=11, argv=0x7fff1cbbb118) at ../gateway.cc:1300


Navigate by Date:
Prev: Re: VXI11 - GPIB Zenon Szalata
Next: Problem as loading munch to vxWorks/mvme5500 Mi Qingru
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
Navigate by Thread:
Prev: Re: VXI11 - GPIB Zenon Szalata
Next: Problem as loading munch to vxWorks/mvme5500 Mi Qingru
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·