EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: IOC segmentation fault related to CA security
From: Michael Davidsaver <[email protected]>
To: [email protected]
Date: Sat, 16 Apr 2016 13:43:52 -0400
Hi Matt,

I've created https://bugs.launchpad.net/epics-base/+bug/1571224 for this
bug.

I can recall seeing "dbCa:exceptionCallback" associated with ACF
disconnects before, but don't recall any crashes.  This might just be
chance.  The 'channel "unknown"' certainly stands out.

Would it be easy for you to take a packet capture of the traffic between
these two IOCs when this crash occurs?  This might give some clues about
the specific sequence of events which triggers the crash.

Also, is the communication between these two IOCs direct?  You don't
mention any ca gateway.

Michael


On 04/15/2016 05:27 PM, Pearson, Matthew R. wrote:
> Hi,
>
> I’ve had a few instances of one of my soft IOCs crashing with a segmentation fault when I shutdown another IOC that hosts a PV used as part of the CA security logic on the other crashed IOC.
>
> There is sometimes (but not always) this message printed out before the crash:
>
> dbCa:exceptionCallback stat "Virtual circuit disconnect" channel "unknown" context "cg1d-dassrv1.ornl.gov:5064"
> nativeType DBR_invalid requestType DBR_invalid nativeCount 0 requestCount 0 noReadAccess noWriteAccess
>
> Printing the stack trace:
>
> Core was generated by `../../bin/linux-x86_64/cg1d-parker1 ./st.cmd'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f5321218599 in ellDelete (pList=0x7f52bc000920, pNode=0x7f52ac008ec0) at ../../../src/libCom/ellLib/ellLib.c:87
> 87              pNode->previous->next = pNode->next;
> Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 readline-6.0-4.el6.x86_64
> (gdb) bt
> #0  0x00007f5321218599 in ellDelete (pList=0x7f52bc000920, pNode=0x7f52ac008ec0) at ../../../src/libCom/ellLib/ellLib.c:87
> #1  0x00007f532217c289 in casAccessRightsCB (ascpvt=0x7f52b8000db8, type=asClientCOAR) at ../camessage.c:1111
> #2  0x00007f5321d64122 in asComputePvt (asClientPvt=0x7f52b8000db8) at ../asLibRoutines.c:1014
> #3  0x00007f5321d63ea0 in asComputeAsgPvt (pasg=0x1f91ee0) at ../asLibRoutines.c:940
> #4  0x00007f5321d62419 in asComputeAsg (pasg=0x1f91ee0) at ../asLibRoutines.c:455
> #5  0x00007f5321d60482 in connectCallback (arg=...) at ../asCa.c:99
> #6  0x00007f53214c1301 in oldChannelNotify::disconnectNotify (this=0x7f52d0000d10, guard=...) at ../oldChannelNotify.cpp:112
> #7  0x00007f53214abe30 in nciu::unresponsiveCircuitNotify (this=0x7f5323714010, cbGuard=..., guard=...) at ../nciu.cpp:171
> #8  0x00007f53214b7c38 in tcpiiu::disconnectAllChannels (this=0x7f52d40008c0, cbGuard=..., guard=..., discIIU=...) at ../tcpiiu.cpp:1834
> #9  0x00007f532149a042 in cac::destroyIIU (this=0x7f52d000ed20, iiu=...) at ../cac.cpp:1227
> #10 0x00007f53214b27e3 in tcpSendThread::run (this=0x7f52d4000a00) at ../tcpiiu.cpp:229
> #11 0x00007f532122bc51 in epicsThreadCallEntryPoint (pPvt=0x7f52d4000a08) at ../../../src/libCom/osi/epicsThread.cpp:85
> #12 0x00007f53212333ce in start_routine (arg=0x7f52d400a250) at ../../../src/libCom/osi/os/posix/osdThread.c:385
> #13 0x00007f53204a29d1 in start_thread () from /lib64/libpthread.so.0
> #14 0x00007f53207a08fd in clone () from /lib64/libc.so.6
>
>
> The crashed IOC has a CA access security rule that looks like:
>
> ASG(DEFAULT)
> {
>     INPA("$(P):Scan:Active")
>     RULE(1, READ)
>     RULE(1, WRITE)
>     {
>         CALC("A=0")
>     }
>     RULE(1, WRITE)
>     {
>         UAG(epics, beamline, detector)
>         HAG(beamline)
>         CALC("A=1")
>     }
> }
>
> where $(P):Scan:Active is hosted by the IOC that I’m shutting down.
>
> In addition there is a channel access link between the two IOCs involving a CP link.
>
> I can’t reliably reproduce it, but it’s happened a few times today as I was testing it (stopping and starting the IOC hosting $(P):Scan:Active perhaps 20 times).
>
> Anybody have ideas about this? 
>
> Our base version is 3.14.12.4 running on RHEL6.
>
> Cheers,
> Matt
>
>
> Data Acquisition and Control Engineer
> Spallation Neutron Source
> Oak Ridge National Lab
>
>
>
>
>
>


Replies:
Re: IOC segmentation fault related to CA security Kasemir, Kay
References:
IOC segmentation fault related to CA security Pearson, Matthew R.

Navigate by Date:
Prev: IOC segmentation fault related to CA security Pearson, Matthew R.
Next: EPICS at the last continent - Antarctic Shen, Guobao
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: IOC segmentation fault related to CA security Pearson, Matthew R.
Next: Re: IOC segmentation fault related to CA security Kasemir, Kay
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 15 Jul 2016 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·