EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Segmentation fault when Gateway calls ca_clear_channel: removing node from empty createReqPend linked list
From: "Paduan Donadio, Marcio via Tech-talk" <tech-talk at aps.anl.gov>
To: EPICS tech-talk <tech-talk at aps.anl.gov>
Date: Fri, 2 Jul 2021 21:22:38 +0000

We’ve been seeing occasional crashes on our EPICS gateways at SLAC due to segmentation fault. For details with the IOC shell messages and some gdb data, please look at the issue description here: https://github.com/epics-extensions/ca-gateway/issues/1

 

I’m studying the case and it seems to me that the problem is in the channel access client code, not in the gateway. When the gateway calls ca_clear_channel(), one of the internal steps is to remove the channel from the createReqPend linked list.

 

File ca/src/client/tcpiiu.cpp:

 

tcpiiu::uninstallChan (…) {

(…)

  this->createReqPend.remove ( chan );

(…) }

 

When you check the gdb core dump file inside the frame executing tsDLList.h, you see that createReqPend->pLast = 0 and createReqPend->pFirst = 0. Looks like to me that the list is empty. “item” and “theNode” corresponds to the item that is to be removed and “this” is the pointer to the linked list (i.e. tsDLNode<T> &theNode = item).

 

(gdb) info args

item = @0x757cdd0

this = 0x7f911c0077e0

(gdb) p prevNode

$1 = (tsDLNode<nciu> &) @0x0: <error reading variable>

(gdb) p theNode

$2 = (tsDLNode<nciu> &) @0x757cdf0: {pNext = 0x757d250, pPrev = 0x0}

(gdb) p this->pLast

$3 = (nciu *) 0x0

(gdb) p this->pFirst

$4 = (nciu *) 0x0

 

There are 5 functions that can be used to empty a tsDLList: remove(), removeAll(), get(), pop(), and the constructor itself that calls clear().

 

In addition to tcpiiu::uninstallChan(…) that was called in the chain that triggered this seg fault, I could find these other possible calls to createReqPend.get(): tcpiiu::disconnectAllChannels, tcpiiu::unlinkAllChannels, and tcpSendThread::run().

 

This last one calls createReqPend.get() inside the while(true) as part of the ordinary tasks:

while ( nciu * pChan = this->iiu.createReqPend.get () )

 

Additionally, we have this in cac.cpp: destroyIIU calls disconnectAllChannels and ~cac calls unlinkAllChannels.

 

Finally, in tcpiiu.cc, tcpRecvThread::run() calls

 

if (! connectSuccess) {

(…)

destroyIIU

(…) }

 

and tcpSendThread::run() calls   this->iiu.cacRef.destroyIIU ( this->iiu ) after the while(true) loop is left.

 

 

Do you see a condition where tcpSendThread::run calls createReqPend.get() or  tcpRecvThread::run calls destroyIIU() “at the same time” as the gateway calls ca_clear_channel() and this condition is not protected by a mutex? Do you have any insight to help me to proceed with the investigation?

 

Thank you a lot,

 

-- 


9k=

Márcio Paduan Donadio | Control Systems Engineer

Advanced Control Systems Department

SLAC National Accelerator Laboratory | Menlo Park, CA

p: 650.926.5007 | w: slac.stanford.edu

 

 


Navigate by Date:
Prev: Re: Question about "error calling writeRead" in modbus Wang, SuYin Grass via Tech-talk
Next: CALC record LINK INVALID status Eric Norum via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: Re: Question about "error calling writeRead" in modbus Wang, SuYin Grass via Tech-talk
Next: CALC record LINK INVALID status Eric Norum via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 03 Jul 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·