After further testing on newer
linux versions where the debugger seems to actually function correctly I now
strongly suspect that this issue is caused by this change. The symptom is that two
threads that CA creates to manage TCP circuits never shutdown. They need to
shutdown typically when the TCP circuit disconnects or when the last channel on
the circuit is deleted. The bug was introduced in R3.14.9. See mantis 363.
cvs diff -r 1.2 -r 1.2.2.1 -wb --
systemCallIntMech.cpp systemCallIntMech.cpp (in directory
C:\hill\R3.14.dll_hell_fix\epics\base\src\libCom\osi\os\posix\)
Index: systemCallIntMech.cpp
===================================================================
RCS file:
/net/phoebus/epicsmgr/cvsroot/epics/base/src/libCom/osi/os/posix/systemCallIntMech.cpp,v
retrieving revision 1.2
retrieving revision 1.2.2.1
diff -u -b -w -b -r1.2 -r1.2.2.1
--- systemCallIntMech.cpp 1 May 2003 22:11:42
-0000 1.2
+++ systemCallIntMech.cpp 14 May 2004 13:36:01
-0000 1.2.2.1
@@ -8,7 +8,7 @@
* and higher are distributed subject to a Software
License Agreement found
* in file LICENSE that is included with this
distribution.
\*************************************************************************/
-/* $Id: systemCallIntMech.cpp,v 1.2 2003/05/01
22:11:42 jhill Exp $ */
+/* $Id: systemCallIntMech.cpp,v 1.2.2.1 2004/05/14
13:36:01 norume Exp $ */
/*
* Author: Jeff Hill
*/
@@ -18,5 +18,5 @@
enum
epicsSocketSystemCallInterruptMechanismQueryInfo
epicsSocketSystemCallInterruptMechanismQuery
()
{
- return esscimqi_socketSigAlarmRequired;
+ return esscimqi_socketBothShutdownRequired;
}
Jeff
______________________________________________________
Jeffrey O. Hill
Email [email protected]
LANL
MS
H820
Voice 505 665 1831
Los Alamos NM 87545 USA
FAX 505 665 5107
Message
content: TSPA
From:
[email protected] [mailto:[email protected]] On
Behalf Of Jeff Hill
Sent: Tuesday, August 25, 2009 12:31 PM
To: 'Core-Talk'
Subject: proper thread cleanup on Linux?
All,
On our Linux (2.4.21-52.ELsmp) test system with R3.14
latest, I see a very strange behavior when I run my CA client side regression
tests - acctst. The symptom is an accumulating number of threads with odd stack
traces (see below). The test doesn’t fail, but the system slows to a crawl as
it runs low on threads (resources). Since these threads are not executing
anywhere in EPICS code, the cause is a complete mystery. Does anyone recognize
this symptom; perhaps as being the symptom of the problem on Linux where a c++
exception handler is catching all exceptions - including some Linux
implementation of pthreads thread exit exception? I don’t reproduce this issue
running on Linux (2.6.9-42.0.3.ELsmp) with R3.14.8.2.
(gdb) thread 1146
[Switching to thread 1146 (Thread -1219617872 (LWP
28422))]#0 0x0013c939 in __lll_mutex_lock_wait () from
/lib/tls/libpthread.so.0
(gdb) bt
#0 0x0013c939 in __lll_mutex_lock_wait () from
/lib/tls/libpthread.so.0
#1 0x00138b21 in _L_mutex_lock_949 () from
/lib/tls/libpthread.so.0
#2 0x00000000 in ?? ()
This is the Linux version:
~/epicsR3.14/epics/extensions/src/gateway$ uname -a
Linux santana 2.4.21-52.ELsmp #1 SMP Tue Sep 25 15:13:04 EDT
2007 i686 i686 i386 GNU/Linux
This is the gdb version (this gdb is known to have some
issues):
~/epicsR3.14/epics/extensions/src/gateway$ gcc -v
Reading specs from
/usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs
Configured with: ../configure --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info --enable-shared
--enable-threads=posix --disable-checking --with-system-zlib
--enable-__cxa_atexit --host=i386-redhat-linux
Thread model: posix
gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-59)
In the debugger I see that threads are being created and
destroyed in matched sets as they should be while the test is running (see a
sample of some of the output below).
[New Thread -1521730640 (LWP 2504)]
[New Thread -1521464400 (LWP 2506)]
[New Thread -1520669776 (LWP 2508)]
[New Thread -1522787408 (LWP 2510)]
[New Thread -1523582032 (LWP 2512)]
[Thread -1522787408 (LWP 2510) exited]
[Thread -1520669776 (LWP 2508) exited]
[Thread -1521464400 (LWP 2506) exited]
[Thread -1521730640 (LWP 2504) exited]
[Thread -1523582032 (LWP 2512) exited]
Have you seen this behavior? Is something wrong with my
version of Linux?
To reproduce this run in two different windows:
excas –p myTest:
acctst myTest:bill 10
or alternativelyin gdb (you will need to build w/o
optimization):
gdb acctst
run myTest:bill 10
^c
info threads
thread <nnnn>
bt
Jeff
______________________________________________________
Jeffrey O. Hill
Email [email protected]
LANL MS
H820
Voice 505 665 1831
Los Alamos NM 87545 USA
FAX 505 665 5107
Message content:
TSPA