Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: RE: Crash when using access security
From: "Hill, Jeff" <johill@lanl.gov>
To: Rod Nussbaumer <bomr@triumf.ca>, epics Techtalk <tech-talk@aps.anl.gov>
Date: Mon, 22 Apr 2013 21:17:19 +0000
Hi Rod,

It appears to be executing in, or just entering into, the function "epicsSingleton < TYPE > :: _build ()" which looks something like this:

template < class TYPE >
void * epicsSingleton < TYPE > :: _build ()
{
    return new TYPE ();
}

This is a static C++ template function, which is actually almost identical in its call mechanism to an ordinary C function, and there isn't much to go wrong here except perhaps an exception gets thrown when new is called or maybe somehow the C++ runtime library isn't properly initialized before main gets called. I don't see an obvious potential problem; based purely on what the debugger indicates.

Does it fail in this precise location every time? I would type "disassemble 0x0008b640" to find out what the code looks like there. Hopefully this memory location hasn't been clobbered by other threads. If it fails there every time then the next step is to start in the debugger, set a break point in a function above (such as cac.cpp:150, in cac::cac), and single step (or even instruction level single step) until the bug happens so you can see whats going down.

However, I must also add that a file scope C++ object is involved here, and with C++ when a file scope object is involved we are suspicious of file scope constructor ordering issues. There are some detailed rules that determine what the compiler must do, but the quick summary is that conventional wisdom dictates that one must wrap the access to such objects with a function call in order to make certain that the object is always initialized before its used. When the behavior changes when switching architectures then we are doubly suspicious of this type of problem.

However, I need to add that there are two possible file scope object constructor ordering situations that one must consider:
1) situations where one file scope object constructor references another file scope object before its fully constructed.
2) situations where another thread references a file scope before its fully constructed.

I don't see any evidence of situation (1) on the stack. The situation (2) is only addressed by very recent versions of the C++ standard. However, as I recall, protecting against this type of problem (2) was precisely the intent of the "SingletonUntyped :: incrRefCount " code as follows which we can see clearly is in play on the stack when this happens. The guard in this code enforces only one thread at a time can create the file scope object.

void SingletonUntyped :: incrRefCount ( PBuild pBuild )
{
    epicsThreadOnce ( & epicsSigletonOnceFlag, SingletonMutexOnce, 0 );
    epicsGuard < epicsMutex > 
        guard ( *pEPICSSigletonMutex );
    assert ( _refCount < SIZE_MAX );
    if ( _refCount == 0 ) {
        _pInstance = ( * pBuild ) ();
    }
    _refCount++;
}

So, if this _is_ in fact your problem (but this explanation appears to be in conflict with what the debugger and the protections in the code indicate BTW), then the simplified fix in localHostName.cpp would be to simply replace line 26 like as follows, and then replace all uses of the variable "localHostNameCache" with " myLocalHostNameCache()" in the ca client library:

BEFORE:
epicsSingleton < localHostName > localHostNameCache;

AFTER:
epicsSingleton < localHostName > & myLocalHostNameCache()
{
  static epicsSingleton < localHostName > * p = new MyClass;
  return *p;
}

You could give me a call in my office if you would like to discuss this further.

Jeff

> -----Original Message-----
> From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-
> bounces@aps.anl.gov] On Behalf Of Rod Nussbaumer
> Sent: Monday, April 22, 2013 1:24 PM
> To: epics Techtalk
> Subject: Crash when using access security
> 
> Hi all.
> 
> Below is pasted the debug output from an IOC running under gdb. The IOC
> repeatably segfaults whenever asSetFilename is called with an
> access-security file that works on other IOCs.
> The IOC in question is running on Linux:
> 
> > Linux icarm178 2.6.21-ts #4 PREEMPT Wed Jan 27 17:12:25 MST 2010
> armv4tl GNU/Linux
> 
> on an ARM that otherwise runs IOCs reliably.
> 
> The segmentation fault occurs at some point during iocInit. I cannot
> really tell where or how to look more closely. The 'asInit...' messages
> seen here are from some printf()s I put into the asInitCommon function
> in asDbLib.c while trying to sort out the problem.
> 
> Looking for any pointers on where or how to focus further diagnostics.
> 
> Thanks.
> 
> Rod Nussbaumer
> ISAC Controls, TRIUMF
> Vancouver, Canada
> 
> 
> 
> 
> > iocInit
> > [New Thread 32769 (LWP 2488)]
> > [New Thread 16386 (LWP 2489)]
> > Starting iocInit
> >
> ###############################################################
> #############
> > ## EPICS R3.14.12.2 $Date: Mon 2011-12-12 14:09:32 -0600$
> > ## EPICS Base built Aug 31 2012
> >
> ###############################################################
> #############
> > [New Thread 32771 (LWP 2490)]
> > [New Thread 49156 (LWP 2491)]
> > [New Thread 65541 (LWP 2492)]
> > [New Thread 81926 (LWP 2493)]
> > [New Thread 98311 (LWP 2494)]
> > [New Thread 114696 (LWP 2495)]
> > [New Thread 131081 (LWP 2496)]
> > [New Thread 147466 (LWP 2497)]
> > [New Thread 163851 (LWP 2498)]
> > [New Thread 180236 (LWP 2499)]
> > [New Thread 196621 (LWP 2500)]
> > [New Thread 213006 (LWP 2501)]
> > [New Thread 229391 (LWP 2502)]
> > [New Thread 245776 (LWP 2503)]
> > asInit: entered
> > asInitCommon: entered
> > asInitCommon: calling asInitFile
> > asInitCommon: returned from asInitFile
> > asInitCommon: calling asCaStart
> > [New Thread 262161 (LWP 2504)]
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 262161 (LWP 2504)]
> > 0x0008b640 in epicsSingleton<localHostName>::_build () at
> ../../../include/epicsSingleton.h:187
> > 187     ../../../include/epicsSingleton.h: No such file or directory.
> >         in ../../../include/epicsSingleton.h
> > Current language:  auto; currently c++
> > (gdb) bt
> > #0  0x0008b640 in epicsSingleton<localHostName>::_build () at
> ../../../include/epicsSingleton.h:187
> > #1  0x000f80d0 in SingletonUntyped::incrRefCount (this=0x14ed64,
> pBuild=0x8b624 <epicsSingleton<localHostName>::_build()>)
> >     at ../../../src/libCom/cxxTemplates/epicsSingletonMutex.cpp:48
> > #2  0x0008b5c8 in reference (this=0x1a94bc, es=@0x14ed64) at
> ../../../include/epicsSingleton.h:85
> > #3  0x0008b614 in epicsSingleton<localHostName>::getReference
> (this=0x14ed64) at ../../../include/epicsSingleton.h:205
> > #4  0x000b65bc in cac (this=0x1a94b0, mutualExclusionIn=@0x1a92c8,
> callbackControlIn=@0x1a92cc, notifyIn=@0x1a9258) at ../cac.cpp:150
> > #5  0x00097330 in ca_client_context::createNetworkContext
> (this=dwarf2_read_address: Corrupted DWARF expression.
> > ) at ../ca_client_context.cpp:730
> > #6  0x0007148c in dbContext::createChannel (this=0x1a9420,
> guard=@0x7d9ff9bc, pName=0x1aca70 "ITW:ACCESS:CONSOLE1",
> notifyIn=@0x1acdc0, priority=0) at ../dbContext.cpp:104
> > #7  0x0009768c in ca_client_context::createChannel (this=0x1a9258,
> guard=@0x7d9ff9bc, pChannelName=0x1aca70 "ITW:ACCESS:CONSOLE1",
> chan=@0x1acdc0, pri=0)
> >     at ../ca_client_context.cpp:668
> > #8  0x000a3c04 in oldChannelNotify (this=0x1acdc0, guard=@0x7d9ff9bc,
> cacIn=@0x1a9258, pName=0x1aca70 "ITW:ACCESS:CONSOLE1",
> pConnCallBackIn=0x4fa70 <connectCallback>,
> >     pPrivateIn=0x1aca58, priority=0) at ../oldChannelNotify.cpp:54
> > #9  0x0008cf3c in ca_create_channel (name_str=0x1aca70
> "ITW:ACCESS:CONSOLE1", conn_func=0x4fa70 <connectCallback>,
> puser=0x1aca58, priority=0, chanptr=0x1a94a8)
> >     at ../access.cpp:333
> > #10 0x0008d2a4 in ca_search_and_connect (name_str=0x1aca70
> "ITW:ACCESS:CONSOLE1", chanptr=0x1a94a8, conn_func=0x4fa70
> <connectCallback>, puser=0x1aca58) at ../access.cpp:297
> > #11 0x0004f620 in asCaTask () at ../asCa.c:184
> > #12 0x001149d4 in start_routine (arg=<value optimized out>) at
> ../../../src/libCom/osi/os/posix/osdThread.c:385
> > #13 0x2aacefb0 in pthread_start_thread () from /lib/libpthread.so.0
> > #14 0x2aacefb0 in pthread_start_thread () from /lib/libpthread.so.0
> > #15 0x2aacefb0 in pthread_start_thread () from /lib/libpthread.so.0



References:
Crash when using access security Rod Nussbaumer

Navigate by Date:
Prev: and Speaking of sysAtReboot.... Bjorklund, Eric A
Next: RE: caget delays Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: Re: Crash when using access security Till Straumann
Next: Re: Crash when using access security zhaozhuo
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·