Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Possible deadlock issue with asyn layer?
From:	Mark Davis <[email protected]>
To:	"[email protected]" <[email protected]>, Mark Rivers <[email protected]>
Date:	Mon, 16 Apr 2018 11:32:57 -0400

We have been experiencing a deadlock problem with one of our customdrivers based on the asynPortDriver class, and after reviewing the codeand information gained via the gdb debugger, I have a theory regardingthe cause that I would like others to review and tell me if there areany flaws they can spot (either in my logic or my assumptions).



First, a summary of the driver that is exhibiting the problem:

   - Non-blocking driver based on asynPortDriver class
   - One device per driver instance

- Communication with device handled by a thread launched by eachinstance of the driver

   - All asyn calls by driver thread surrounded by lock()/unlock() calls
   - info(asyn:READBACK, "1") tag used in some of the output records

- writeXxx() calls don't do any I/O: They save the new value to beprocessed later (and sometimes call setXxxParam(), setParamStatus(), andcallParamCallbacks() functions)

The driver thread runs periodically to update readings and processnew settings as follows:

      - reads values form the controller
      - calls lock()
      - calls setXxxParam() and setParamStatus() as needed

- calls callParamCallbacks() to update the records affected bythe changes (acquiring the scan lock for each record while it is processed)

      - calls unlock()

- Send new settings to the controller (using lock()/unlock() whencalling any asyn functions or driver values that are touched by thewriteXxx() functions)

Concurrently with the driver threads usual processing, writes tooutput records occur. The processing of each write (I assume) includes the followingsequence of events: - The thread that did the write (a CAC-client thread, in thiscase) gets the scan lock for the record and begins processing the record - Processing of the record includes calls to asyn layerfunctions, which requires acquiring the asyn lock for the driverinstance before calling one of its writeXxx() functions

      - The asyn lock for the driver is released
      - The scan lock for the record is released

What is happening:

The driver thread sometimes blocks indefinitely waiting to acquirethe scan lock for an output record it needs to update



My theory as to what can cause this:

   What we know:

- The driver thread blocks indefinitely waiting for the scan lockfor an output record that includes the info(asyn:READBACK, "1") tag

        (see extract from gdb output below and the template for the record)

- The lockup of the driver thread is always preceded by a CAclient write to an output record for one of the driver's parameters (haven't yet confirmed it is the same record that the driveris waiting for when it hangs)


   What I believe is happening:

- Sometime after the driver calls lock() and when it blockswaiting for the scan lock for an output record, a CA client threadwrites to the same record. It successfully acquires the scan lock forthe record, but then blocks waiting to acquire the asyn lock for thedriver instance (which the driver thread already has).

Obviously, if I am correct, the easiest way to avoid the problem is toeliminate the use of the info(asyn:READBACK, "1") tag (at least for anycase where this could happen). We don't actually need those tags forthis database anymore, so that is something we will be trying shortly.

But can anyone point out a mistake in my reasoning or my assumptions? IS this something we need to be aware of when using the info tag?


Mark Davis
NSCL/FRIB Control Systems Software Engineer
[email protected]

========================== from gdb ===================================

#0 __lll_lock_wait () at../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007f294a554494 in _L_lock_952 () from/lib/x86_64-linux-gnu/libpthread.so.0

#2 0x00007f294a5542f6 in __GI___pthread_mutex_lock (mutex=0x18db030) at../nptl/pthread_mutex_lock.c:114

#3 0x00007f294adc17a3 in mutexLock (id=0x18db030) at../../../src/libCom/osi/os/posix/osdMutex.c:46


#4  0x00007f294adc197f in epicsMutexOsdLock (pmutex=0x18db030)
    at ../../../src/libCom/osi/os/posix/osdMutex.c:130

#5  0x00007f294adb85fb in epicsMutexLock (pmutexNode=0x18db070)
    at ../../../src/libCom/osi/epicsMutex.cpp:143

#6 0x00007f294b019733 in dbScanLock (precord=0xe9d7d0) at../../../src/ioc/db/dbLock.c:265

#7 0x00007f294bf8e18d in interruptCallbackOutput (drvPvt=0x1d08640,pasynUser=0x1d08f98, value=0)

    at ../../asyn/devEpics/devAsynUInt32Digital.c:500

#8 0x00007f294bf74540 in paramList::uint32Callback (this=0xb3cd50,command=358, addr=0, interruptMask=4294967295) at../../asyn/asynPortDriver/asynPortDriver.cpp:628


#9  0x00007f294bf74adc in paramList::callCallbacks (this=0xb3cd50, addr=0)
    at ../../asyn/asynPortDriver/asynPortDriver.cpp:750

#10 0x00007f294bf76417 in asynPortDriver::callParamCallbacks(this=0xb3ce60, list=0, addr=0)

    at ../../asyn/asynPortDriver/asynPortDriver.cpp:1510

#11 0x00007f294bf763a7 in asynPortDriver::callParamCallbacks (this=0xb3ce60)
    at ../../asyn/asynPortDriver/asynPortDriver.cpp:1496

...
_________________________________________________________________________________
Jump to any frame in the callstack to examine arguments (frame 6, here):

(gdb) f 6

#6 0x00007f294b019733 in dbScanLock (precord=0xe9d7d0) at../../../src/ioc/db/dbLock.c:265

265     ../../../src/ioc/db/dbLock.c: No such file or directory.
_________________________________________________________________________________
(gdb) p precord

$1 = (dbCommon *) 0xe9d7d0
(gdb) p *(dbCommon *)precord

$2 = {name = "MPS_FPS:MSTR_N0001:Slave11_MaskBit42-Sel", '\000' <repeats20 times>, desc = '\000' <repeats 40 times>, asg = '\000' <repeats 28 times>,scan = 0, pini = 0, phas = 0, <snipped>


_________________________________________________________________________________

record(bo, "$(SYS)_$(SUB):$(DEV)_$(INST):Slave$(SLAVE)_MaskBit$(BIT)-Sel")
{
  field(DTYP, "asynUInt32Digital")

field(OUT, "@asynMask($(ASYN_PORT),0,$(ASYN_MASK),$(TIMEOUT))slave$(SLAVE)Mask$(MASK_REG)_Set")

  field(ZNAM,"Mask")
  field(ONAM,"Unmask")
  info(asyn:READBACK, "1")
  info(autosaveFields, "VAL")
}

Replies:: Re: Possible deadlock issue with asyn layer? Pearson, Matthew R.; RE: Possible deadlock issue with asyn layer? Mark Rivers

Navigate by Date:: Prev: Re: EPICS 7 Compilation in windows-64 Vishnu Patel; Next: Re: Possible deadlock issue with asyn layer? Pearson, Matthew R.; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: Re: EPICS 7 Compilation in windows-64 Vishnu Patel; Next: Re: Possible deadlock issue with asyn layer? Pearson, Matthew R.; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025

ANJ, 16 Apr 2018

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System