Brian,
> I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.
I spoke too soon there. When you get the hang you are seeing these 2 mutexes always locked:
epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222
I did not see a epicsMutex being created in StreamDevice, but I was wrong. When epicsMutexCreate() is called from C code it stores the file and line number where the mutex was created. However, StreamDevice creates an epicsMutex using the C++ constructor for the epicsMutex class, rather than directly calling epicsMutexCreate(). The problem is that the epicsMutex::epicsMutex constructor (in base/src/libCom/osi/epicsMutex.cpp) itself calls epicsMutexCreate(), so the file and line number that are stored for that mutex are the file and line number of the C++ constructor, not the location where the constructor is called from, so it is always the same! This should be fixed in a future version of EPICS base, since it would be more useful to know where the epicsMutex constructor was called from.
Fortunately StreamDevice appears to only create a single epicsMutex object, and this occurs in the constructor for the Stream class.
It looks to me like there is a subtle bug in the StreamDevice + asyn that is creating a classic deadlock.
This could occur under the following circumstances:
Thread #1 made calls in this order:
pasynManager->lockPort();
Stream::mutex.lock();
Thread #2 made calls in this order:
Stream::mutex.lock();
pasynManager->lockPort();
The deadlock occurs when Thread #1 calls pasynManager->lockPort() and then Thread #2 immediately calls Stream::mutex.lock(). Now each thread is blocked trying to take the mutex that the other thread has locked.
I don't think it's possible for the deadlock to be directly caused by code in asyn, because asyn does not know anything about the StreamDevice mutex.
My questions are:
- Why have we not seen this before if it occurs with a single I/O Intr scanned record using StreamDevice? I think that many sites are doing that.
- Does this occur in a "stripped down" IOC that is only running this single StreamDevice record and nothing else? Could it be caused by some other code in your IOC?
Mark
-----Original Message-----
From: Mark Rivers
Sent: Monday, July 08, 2013 10:01 AM
To: 'Brian McAllister'
Cc: Dirk Zimoch; [email protected]
Subject: RE: asyn+streamDevice hangs with synchronousLock:Yes
> The reference to asynManager.c is in registerPort(). It seems odd that
> this function would be called other than during iocInit(), and that it
> would get stuck trying to create what should be a new mutex.
registerPort() not getting called again. epicsMutexShowAll reports the line in which the mutex was originally created, which in this case happened before iocInit when registerPort() was called.
I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.
It would be helpful if you could run the gdb debugger on the IOC application when it is hung up.
Do that as follows:
- Your IOC is running as a process on the Linux system. Get its process ID using the "ps" command.
For example, here I will debug the testAsynPortDriver IOC application:
corvette:asyn/iocBoot/ioctestAsynPortDriver>ps ax | grep testAsyn
26058 pts/33 Sl+ 0:00 ../../bin/linux-x86/testAsynPortDriver st.cmd
26188 pts/9 S+ 0:00 grep --color=auto testAsyn
So the IOC process ID is 26058.
- Start gdb as follows:
gdb [path_to_IOC_application] [process_ID]
For my example above:
corvette:asyn/iocBoot/ioctestAsynPortDriver>gdb ../../bin/linux-x86/testAsynPortDriver 26058
Once the debugger starts issue the following command:
(gdb) thread apply all backtrace
For example, the beginning of the output for my example is:
Thread 22 (Thread 0xf73f7b70 (LWP 26059)):
#0 0xf775d430 in __kernel_vsyscall ()
#1 0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3 0xf75b81b8 in condTimedwait (time=0xf73f7218, mutexId=0x97cfa58, condId=0x97cfa70) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4 epicsEventWaitWithTimeout (pevent=0x97cfa58, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5 0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6 0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7 0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8 0xf75b5cad in start_routine (arg=0x97cfbd0) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9 0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6
Thread 21 (Thread 0xf7376b70 (LWP 26060)):
#0 0xf775d430 in __kernel_vsyscall ()
#1 0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3 0xf75b81b8 in condTimedwait (time=0xf7376218, mutexId=0x97cff88, condId=0x97cffa0) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4 epicsEventWaitWithTimeout (pevent=0x97cff88, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5 0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6 0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7 0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8 0xf75b5cad in start_routine (arg=0x97d0100) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9 0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6
That will give a stack trace of every thread, and we can see if there are multiple threads waiting on mutexes, and what code those threads are executing.
Mark
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Brian McAllister
Sent: Monday, July 08, 2013 9:48 AM
To: Mark Rivers
Cc: Dirk Zimoch; [email protected]
Subject: Re: asyn+streamDevice hangs with synchronousLock:Yes
>>> On 7/8/2013 at 3:26:08 -0000, Mark Rivers wrote:
> That is interesting. When it is hung and you type
> epicsMutexShowAll, 1
> repeatedly, do you always see the same thing, i.e.
>> ellCount(&mutexList) 117 ellCount(&freeList) 12
>> epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
>> epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222
Yes. Once it gets stuck, that report doesn't change.
The reference to asynManager.c is in registerPort(). It seems odd that
this function would be called other than during iocInit(), and that it
would get stuck trying to create what should be a new mutex.
- brian
- References:
- RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
- Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
- RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
- Navigate by Date:
- Prev:
RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
- Next:
Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
<2013>
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers
- Next:
Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
<2013>
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|