Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	RE: asyn+streamDevice hangs with synchronousLock:Yes
From:	Mark Rivers <[email protected]>
To:	"'Brian McAllister'" <[email protected]>
Cc:	"'[email protected]'" <[email protected]>
Date:	Mon, 8 Jul 2013 17:46:03 +0000

Brian,

> I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.

I spoke too soon there.  When you get the hang you are seeing these 2 mutexes always locked:

epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222

I did not see a epicsMutex being created in StreamDevice, but I was wrong.  When epicsMutexCreate() is called from C code it stores the file and line number where the mutex was created.  However, StreamDevice creates an epicsMutex using the C++ constructor for the epicsMutex class, rather than directly calling epicsMutexCreate().  The problem is that the epicsMutex::epicsMutex constructor (in base/src/libCom/osi/epicsMutex.cpp) itself calls epicsMutexCreate(), so the file and line number that are stored for that mutex are the file and line number of the C++ constructor, not the location where the constructor is called from, so it is always the same!  This should be fixed in a future version of EPICS base, since it would be more useful to know where the epicsMutex constructor was called from.

Fortunately StreamDevice appears to only create a single epicsMutex object, and this occurs in the constructor for the Stream class.

It looks to me like there is a subtle bug in the StreamDevice + asyn that is creating a classic deadlock.

This could occur under the following circumstances:

Thread #1 made calls in this order:
   pasynManager->lockPort();
   Stream::mutex.lock();

Thread #2 made calls in this order:
  Stream::mutex.lock();
  pasynManager->lockPort();

The deadlock occurs when Thread #1 calls pasynManager->lockPort() and then Thread #2 immediately calls Stream::mutex.lock().  Now each thread is blocked trying to take the mutex that the other thread has locked.

I don't think it's possible for the deadlock to be directly caused by code in asyn, because asyn does not know anything about the StreamDevice mutex.

My questions are:

- Why have we not seen this before if it occurs with a single I/O Intr scanned record using StreamDevice?  I think that many sites are doing that.

- Does this occur in a "stripped down" IOC that is only running this single StreamDevice record and nothing else?  Could it be caused by some other code in your IOC?

Mark

-----Original Message-----
From: Mark Rivers 
Sent: Monday, July 08, 2013 10:01 AM
To: 'Brian McAllister'
Cc: Dirk Zimoch; [email protected]
Subject: RE: asyn+streamDevice hangs with synchronousLock:Yes 

> The reference to asynManager.c is in registerPort().  It seems odd that
> this function would be called other than during iocInit(), and that it
> would get stuck trying to create what should be a new mutex.

registerPort() not getting called again.  epicsMutexShowAll reports the line in which the mutex was originally created, which in this case happened before iocInit when registerPort() was called.

I had guessed that there was a deadlock involving another mutex created by StreamDevice, but that does not seem to be the case.

It would be helpful if you could run the gdb debugger on the IOC application when it is hung up.

Do that as follows:

- Your IOC is running as a process on the Linux system.  Get its process ID using the "ps" command.

For example, here I will debug the testAsynPortDriver IOC application:
corvette:asyn/iocBoot/ioctestAsynPortDriver>ps ax | grep testAsyn
26058 pts/33   Sl+    0:00 ../../bin/linux-x86/testAsynPortDriver st.cmd
26188 pts/9    S+     0:00 grep --color=auto testAsyn

So the IOC process ID is 26058.

- Start gdb as follows:

gdb [path_to_IOC_application] [process_ID]

For my example above:
corvette:asyn/iocBoot/ioctestAsynPortDriver>gdb ../../bin/linux-x86/testAsynPortDriver 26058

Once the debugger starts issue the following command:

(gdb) thread apply all backtrace

For example, the beginning of the output for my example is:
Thread 22 (Thread 0xf73f7b70 (LWP 26059)):
#0  0xf775d430 in __kernel_vsyscall ()
#1  0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0xf75b81b8 in condTimedwait (time=0xf73f7218, mutexId=0x97cfa58, condId=0x97cfa70) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4  epicsEventWaitWithTimeout (pevent=0x97cfa58, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5  0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6  0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7  0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8  0xf75b5cad in start_routine (arg=0x97cfbd0) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9  0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6

Thread 21 (Thread 0xf7376b70 (LWP 26060)):
#0  0xf775d430 in __kernel_vsyscall ()
#1  0x4d3b84f4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x4d301d84 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0xf75b81b8 in condTimedwait (time=0xf7376218, mutexId=0x97cff88, condId=0x97cffa0) at ../../../src/libCom/osi/os/posix/osdEvent.c:65
#4  epicsEventWaitWithTimeout (pevent=0x97cff88, timeout=1.7976931348623157e+308) at ../../../src/libCom/osi/os/posix/osdEvent.c:156
#5  0xf75b0bf5 in epicsEvent::wait(double) () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#6  0xf75be805 in timerQueueActive::run() () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#7  0xf75af312 in epicsThreadCallEntryPoint () from /usr/local/epics/base-3.14.12.3/lib/linux-x86/libCom.so.3.14
#8  0xf75b5cad in start_routine (arg=0x97d0100) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#9  0x4d3b4a2e in start_thread () from /lib/libpthread.so.0
#10 0x4d2f381e in clone () from /lib/libc.so.6

That will give a stack trace of every thread, and we can see if there are multiple threads waiting on mutexes, and what code those threads are executing.

Mark

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Brian McAllister
Sent: Monday, July 08, 2013 9:48 AM
To: Mark Rivers
Cc: Dirk Zimoch; [email protected]
Subject: Re: asyn+streamDevice hangs with synchronousLock:Yes 

>>> On 7/8/2013 at 3:26:08 -0000, Mark Rivers wrote:

  > That is interesting.  When it is hung and you type

  > epicsMutexShowAll, 1

  > repeatedly, do you always see the same thing, i.e.

  >> ellCount(&mutexList) 117 ellCount(&freeList) 12
  >> epicsMutexId 0x894b988 source ../../asyn/asynDriver/asynManager.c line 1868
  >> epicsMutexId 0x89a2710 source ../../../src/libCom/osi/epicsMutex.cpp line 222

Yes.  Once it gets stuck, that report doesn't change.

The reference to asynManager.c is in registerPort().  It seems odd that
this function would be called other than during iocInit(), and that it
would get stuck trying to create what should be a new mutex.

- brian

References:: RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers; Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister; RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers

Navigate by Date:: Prev: RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers; Next: Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: asyn+streamDevice hangs with synchronousLock:Yes Mark Rivers; Next: Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

ANJ, 20 Apr 2015

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System