EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019 
<== Date ==> <== Thread ==>

Subject: RE: Possible deadlock issue with asyn layer?
From: Mark Rivers <rivers@cars.uchicago.edu>
To: "davism50@msu.edu" <davism50@msu.edu>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Thu, 26 Apr 2018 13:58:35 +0000

Hi Mark,

 

Ø  When you say "it will buffer array callbacks", I assume you are referring to the asyn module (for the record to do so doesn't make a lot of sense, and it has no knowledge that this tag even exists, so I will go from there).

 

For this discussion it is useful to think of 3 components of asyn.

 

1)      asynManager.  This is the core part of asyn.  It knows nothing about EPICS records.  In fact it is completely independent of EPICS except that it uses libCom for OS-independent things like mutexes, message queues, events, etc.  The only queuing it provides is for callback requests to communicate with asynchronous drivers (ASYN_CANBLOCK) via pasynManager->queueRequest().

 

2)      Standard asyn device support (devEpics directory).  This is the only part of asyn that knows about EPICS records and depends on EPICS components other than libCom.  It supports callbacks from the driver under 2 conditions:

a.       Input records with SCAN=I/O Intr

b.       Output records with asyn:READBACK=1.

The callback values can be placed in a ring buffer so that values are not lost if the callbacks happen faster than the record can process.  The size of the ring buffer can be controlled with the asyn:FIFO info tag.  The default is 10 for scalar records.  The default is 0 for waveform records, and for stringout and stringin records.  If the ring buffer is in use then each driver callback results in pushing a new value into the buffer and a request to process the record in a separate callback thread.  The driver callbacks do not block waiting for the record to process.

 

3)      asynPortDriver.  asynPortDriver does not support queueing.  It does have a parameter library that stores the most recent value of scalar parameters.  It does not store values for array parameters.

 

 

Ø  So what does it mean to "buffer array callbacks"? 

 

It is the ring buffer in devEpics described above.

 

Ø  If this tag is used, does the doCallbacksXxxArray() functions return an error if the queue is full?

 

If the queue is full then the oldest value in the queue is discarded and the new value is added.  This guarantees that the record will eventually have the value of the most recent callback, but it may skip some before this.  If ASYN_TRACE_WARNING is set then a warning message is printed on the IOC console such as this one:

 

devAsynInt32.c-        if (pPvt->ringBufferOverflows > 0) {

devAsynInt32.c:            asynPrint(pPvt->pasynUser, ASYN_TRACE_WARNING,

devAsynInt32.c-                "%s %s::%s warning, %d ring buffer overflows\n",

devAsynInt32.c-                                    pPvt->pr->name, driverName, functionName, pPvt->ringBufferOverflows);

devAsynInt32.c-            pPvt->ringBufferOverflows = 0;

devAsynInt32.c-        }


Ø  Of course that issue of a driver-launched thread blocking when performing callbacks would not be unique to waveform records. 

Ø  Does this tag apply only to waveform records, or does it work for scalar values as well? 

 

The asyn:FIFO tag applies to both scalar and waveform records.  If not present it defaults to 10 for scalar records and 0 for waveform records.

 

Ø  Or does it not apply to scalar values (i.e. when ASYN_CANBLOCK is used, writes to and callbacks for scalar values are simply added to the end of the Work Queue, and the only limitation in how many values are buffered is how big the queue can get (?))

The Work Queue is not involved in callbacks from the driver.  It is only involved in callbacks from pasynManager->queueRequest, i.e. callbacks that occur when device support has exclusive access to the asyn port driver and can make calls like pasynInt32->write(), pasynFloat64->read(), etc.

 

Mark

 

 

From: Mark Davis [mailto:davism50@msu.edu]
Sent: Thursday, April 26, 2018 8:23 AM
To: Mark Rivers <rivers@cars.uchicago.edu>; tech-talk@aps.anl.gov
Subject: Re: Possible deadlock issue with asyn layer?

 

Hi Mark,

Thanks for all the feedback (and your time to provide it).  It appears that using ASYN_CANBLOCK is indeed the way to go.

The one thing that I found a little confusing is your description of the asyn:FIFO tag:

> The asynPortDriver layer does not caches array values.  However, for input waveform records
> with SCAN=I/O Intr the behavior depends on the setting of the asyn:FIFO info tag.  The default
> is 0, which means no caching, but non-zero values are supported, in which case it will buffer
> array callbacks.

When you say "it will buffer array callbacks", I assume you are referring to the asyn module (for the record to do so doesn't make a lot of sense, and it has no knowledge that this tag even exists, so I will go from there).

So what does it mean to "buffer array callbacks"?  Does it mean that calls to the doCallbacksXxxArray() functions simply save a copy of the value and leave it up to the port driver thread to actually process the record with the next value in the queue?  That would make sense, as it would seem to provide a buffer against a thread launched by the driver being blocked waiting for the scan lock on a record it is updating.  If this tag is used, does the doCallbacksXxxArray() functions return an error if the queue is full?

Of course that issue of a driver-launched thread blocking when performing callbacks would not be unique to waveform records.  Does this tag apply only to waveform records, or does it work for scalar values as well?  Or does it not apply to scalar values (i.e. when ASYN_CANBLOCK is used, writes to and callbacks for scalar values are simply added to the end of the Work Queue, and the only limitation in how many values are buffered is how big the queue can get (?))

Mark

On 4/25/2018 7:12 PM, Mark Rivers wrote:

  Hi Mark,

 

- The port driver thread (when processing the next entry in the work

queue):

          AM_Lock()

          read/remove entry from the Work Queue

          AM_Unlock()

          asynPortDriver::lock()

          call processCallbackOutput():

               calls one of the write functions in our driver

          asynPortDriver::unlock()

 

That is not quite correct.  It is actually:

  - The port driver thread (when processing the next entry in the work

queue):

         AM_Lock()

         read/remove entry from the Work Queue

         AM_Unlock()

         call processCallbackOutput():

         calls one of the C linkage write functions in asynPortDriver

         asynPortDriver::lock()

              calls one of the C++ write functions in our driver

        asynPortDriver::unlock()

 

> Note that this works IF (and only IF) the port driver thread releases the AM_Lock BEFORE calling asynPortDriver::lock().

 

Each asyn port has 2 mutexes created by asynManager.  One is called asynManagerLock and is use by asynManager to guard the port structure.  This lock is released when the user callback code is called. The other is called synchronousLock and is held when the user callback code is being executed.  It is the lock used in the call to pasynManager->lockPort().  These 2 locks are never held at the same time, asynManagerLock is released before taking synchronousLock.  I don’t think the deadlock scenario you outlined can occur.  I’m pretty sure we would know about it by now if it were a problem, since areaDetector and other applications do these callbacks at 100s of Hz constantly.

 

Ø  There are no array param equivalents to the setXxxParam() functions for scalar values, and the array value has to be provided in the doCallbacksXxxArray() functions, which would seem to imply that the asyn layer does not attempt to maintain a copy of the latest value for array parameters like it does for scalars (which makes perfect sense). 

 

It is correct that asynPortDriver does not maintain a copy of the array parameters.

 

Ø  I also assume this means that if you have a waveform record with DTYP set to "asynXxxArrayIn" and SCAN NOT set to "I/O Intr", then any time it processes it will end up calling the appropriate readXxxArray() function in our driver.

 

That is correct.  It will generate an error message if your driver did not implement readXxxArray().

 

Ø  But for this discussion, the important questions is:  How are writes to an array param handled when ASYN_CALLBACK is used?

 

Ø  After looking over the devAsynXxxArray.h file and a bit of asynManager.c, it looks like, unless you change the default 0 value for the # of ring buffers, it does not make a copy and just assumes that the value in the waveform record will not change until after it has processed (although it at least checks for that complains if another interrupt callback occurs before the previous value was processed).

 

That is correct.

 

Ø  For reads, I assume that there are 2 approaches:

Ø  - SCAN set to "I/O Intr", which means the driver calls doCallbacksXxxArray() whenever it has a new value, which causes the new value to be posted to the record

Ø  - Anything else for SCAN, which means the driver's readXxxArray() function gets called to get a new value for the record.

Ø  As with the default config for writes, the asyn layer does not cache any array values for read operations

 

The asynPortDriver layer does not caches array values.  However, for input waveform records with SCAN=I/O Intr the behavior depends on the setting of the asyn:FIFO info tag.  The default is 0, which means no caching, but non-zero values are supported, in which case it will buffer array callbacks.

 

Mark

 

 

 

 

-----Original Message-----
From: Davis, Mark [mailto:mark_a_davis@comcast.net]
Sent: Friday, April 20, 2018 9:01 AM
To: Mark Rivers <rivers@cars.uchicago.edu>; Pearson, Matthew R. <pearsonmr@ornl.gov>
Cc: tech-talk@aps.anl.gov
Subject: Re: Possible deadlock issue with asyn layer?

 

Hi Mark,

 

Your assumptions are correct:

    "asyn port lock" ==>  asynPortDriver::lock()

    "asyn port thread" ==> thread created to service the Work Queue when when ASYN_CANBLOCK is specified (what is referred to as the "port driver thread" in the description of Figure 1 in the asynDriver.html file).

    "processing a new setting" ==> What the "port driver thread"

executes when processing an entry from the Work Queue.

 

From your description, I believe the following is a summary of the (relevant portion) of sequences and locks used by the different threads when the ASYN_CANBLOCK flag  is specified.

NOTE:  I am using "AM_(Un)Lock()" to refer to the asynManager lock used to protect the Work Queue.

 

  - Application/client thread (when adding a write request to the Work

Queue)

         dbScanLock()

         AM_Lock()

         Add request to the Work Queue

         AM_Unlock()

         dbScanUnlock()

 

  - The port driver thread (when processing the next entry in the work

queue):

         AM_Lock()

         read/remove entry from the Work Queue

         AM_Unlock()

         asynPortDriver::lock()

         call processCallbackOutput():

              calls one of the write functions in our driver

         asynPortDriver::unlock()

 

  - Our driver thread (when processing new readings from the device):

         asynPortDriver::lock()

         callParamCallbacks():

              dbScanLock()

              post new value to output record

              dbScanUnlock()

        asynPortDriver::unlock()

 

Note that this works IF (and only IF) the port driver thread releases the AM_Lock BEFORE calling asynPortDriver::lock().

 

If that is NOT the case, then deadlocks are still possible.

 

For example:

   The Application thread gets the dbScanLock() and is preempted.

   The port driver thread runs and gets the AM_Lock and is preempted.

   The driver thread runs and gets the asynPortDriver::lock() and then blocks on dbScanLock().

   The Application thread resumes and blocks waiting for for the AM_Lock.

   The port driver thread resumes and blocks waiting for the

asynPortDriver::lock()

 

Probably less likely than when not using ASYN_CALLBACK, but it could still happen if the port driver thread needs both locks at the same time.

But I assume that your point is that it does NOT hold both locks at the same time, and so using ASYN_CALLBACK seems like it would indeed avoid the possibility of a deadlock.

 

At least for scalar values.....

 

There are no array param equivalents to the setXxxParam() functions for scalar values, and the array value has to be provided in the

doCallbacksXxxArray() functions, which would seem to imply that the asyn layer does not attempt to maintain a copy of the latest value for array parameters like it does for scalars (which makes perfect sense).  I also assume this means that if you have a waveform record with DTYP set to "asynXxxArrayIn" and SCAN NOT set to "I/O Intr", then any time it processes it will end up calling the appropriate readXxxArray() function in our driver.

 

But for this discussion, the important questions is:  How are writes to an array param handled when ASYN_CALLBACK is used?

 

After looking over the devAsynXxxArray.h file and a bit of asynManager.c, it looks like, unless you change the default 0 value for the # of ring buffers, it does not make a copy and just assumes that the value in the waveform record will not change until after it has processed (although it at least checks for that complains if another interrupt callback occurs before the previous value was processed).

 

For reads, I assume that there are 2 approaches:

    - SCAN set to "I/O Intr", which means the driver calls

doCallbacksXxxArray() whenever it has a new value, which causes the new value to be posted to the record

    - Anything else for SCAN, which means the driver's readXxxArray() function gets called to get a new value for the record.

    As with the default config for writes, the asyn layer does not cache any array values for read operations

 

Mark Davis

NSCL/FRIB

 

 

 

On 4/18/2018 3:22 PM, Mark Rivers wrote:

> Hi Mark,

>> At what points in the described sequences are locks acquired and released (the scan lock, the asyn port lock, and any other locks)?

> When you say "asyn port lock" are you referring to the lock that is done with asynPortDriver::lock()?  I assume so.  If so, that lock is part of asynPortDriver and is not known to the asynManager software.

>        > Are changes to a port's Work Queue protected by the asyn port lock, or does that use a different lock?

> That is done by asynManager, and so uses a lock that is part of asynManager, not the lock in asynPortDriver.

>        > When processing a new setting, does the asyn port thread still have to call dbScanLock()?

> I'm not sure I understand the question.

>    - What code are you referring to when you say "processing a new setting"?

>    - When you say "asyn port thread" you are now talking about a driver created with ASYN_CANBLOCK, so asynManager created a port thread?  I'll assume so.

> Let's take a specific example of an output record with asynInt32 device support.  The asyn port thread is the code that executes the processCallbackOutput() function in devAsynInt32.c.  It executes when the pasynManager->queueRequest gets to this queue element.

> static void processCallbackOutput(asynUser *pasynUser) {

>      devPvt *pPvt = (devPvt *)pasynUser->userPvt;

>      dbCommon *pr = pPvt->pr;

>      static const char *functionName="processCallbackOutput";

>      pPvt->result.status = pPvt->pint32->write(pPvt->int32Pvt, pPvt->pasynUser,pPvt->result.value);

>      pPvt->result.time = pPvt->pasynUser->timestamp;

>      pPvt->result.alarmStatus = pPvt->pasynUser->alarmStatus;

>      pPvt->result.alarmSeverity = pPvt->pasynUser->alarmSeverity;

>      if(pPvt->result.status == asynSuccess) {

>          asynPrint(pasynUser, ASYN_TRACEIO_DEVICE,

>              "%s %s::%s process value %d\n",pr->name, driverName, functionName,pPvt->result.value);

>      } else {

>         asynPrint(pasynUser, ASYN_TRACE_ERROR,

>             "%s %s::%s process error %s\n",

>             pr->name, driverName, functionName, pasynUser->errorMessage);

>      }

>      if(pr->pact)

> callbackRequestProcessCallback(&pPvt->processCallback,pr->prio,pr);

> }

> The first thing it does is call the asynInt32::write function in your driver.  When that returns it sets some variables in devPvt.  Then if the driver is asynchronous (i.e. ASYN_CANBLOCK was set) it calls callbackRequestProcessCallback() to finish record processing.  Note that is does not call dbScanLock because it is not modifying the record.

> I think that if you set ASYN_CANBLOCK then you will not have the deadlock.  As I said that it what most drivers are using, and I have not had previous reports of deadlocks in that case.

> Mark

> -----Original Message-----

> From: Davis, Mark [mailto:mark_a_davis@comcast.net]

> Sent: Wednesday, April 18, 2018 12:22 PM

> To: Mark Rivers <rivers@cars.uchicago.edu>; Pearson, Matthew R.

> <pearsonmr@ornl.gov>

> Cc: tech-talk@aps.anl.gov

> Subject: Re: Possible deadlock issue with asyn layer?

> I had not considered just how the ASYN_CANBLOCK might affect when the various lock/unlock functions are called.  But now that I am looking at that, it is not clear what the affects are.

> Could you fill in the blanks a bit?  The easiest thing might be to use Figure 1 in the asynDriver.html as a reference for the following:

> At what points in the described sequences are locks acquired and released (the scan lock, the asyn port lock, and any other locks)?

>        Are changes to a port's Work Queue protected by the asyn port lock, or does that use a different lock?

>        When processing a new setting, does the asyn port thread still have to call dbScanLock()?

> Unless the dbScanLock is NOT needed by the asyn port thread, the only scenario I can think of that would eliminate the problem and would not conflict with the sequence shown in Figure 1 in the asynDriver.html document is this:  The lock that protects the Work Queue is NOT the same as the asyn port lock AND it does NOT have to be held at the same time as the dbScanLock.  This would mean that the asyn port thread did NOT need to simultaneously acquire the same two locks as either the application/client thread or the one launched by the driver to get and process new readings from the device.

> However, if the asyn port lock IS used to protect the Work Queue, then the same race condition still exists between any thread that writes to one of the output records with the READBACK tag and the driver-launched thread that periodically gets and processes new readings.

> NOTE:  If we specify ASYN_CANBLOCK and it still locks up, that will of course verify that using the blocking-mode setting doesn't help. But if it DOESN'T lock up, that might simply mean that the timing was changed enough that it is much more difficult to reproduce.  At the moment I am just trying to make sure I understand the normal processing sequences so if/when I am able to experiment a bit I will know what experiments to make and what details to look at if/when it locks up.

> Mark Davis

> NSCL/FRIB

> On 4/16/2018 4:26 PM, Mark Rivers wrote:

>> I was about to say the same thing as Matt.  If you set ASYN_CANBLOCK

>> then your flow for writes would be changed

>> 

>>> For writes, I am assuming the following sequence (all done by a CAC-client thread in our case):

>>>      - write to the output record

>>>      - call dbScanLock(pRec)

>>>      - call pRec->process(pRec)

>> pRec->process sets PACT and returns

>> dbScanUnlock(pRec)

>>>      - call to the write func specified in the Device Support Entry Table (which, in this case, calls an asyn layer function to process the new setting)

>>>      - call function in the asyn layer (which calls it own lock() func to get the asyn-layer lock for a specific instance of the driver)

>>>      - call our driver's writeXxx() function

>> See what happens if you leave READBACK tag in the database but change your driver to ASYN_CANBLOCK.  I suspect the problem may go away.  Most drivers I an others have written have ASYN_CANBLOCK which could explain why the deadlock has not been observed before.

>> 

>> Mark

>> 

>> 

>> 

>> -----Original Message-----

>> From: Pearson, Matthew R. [mailto:pearsonmr@ornl.gov]

>> Sent: Monday, April 16, 2018 3:19 PM

>> To: mark_a_davis@comcast.net

>> Cc: tech-talk@aps.anl.gov; Mark Rivers <rivers@cars.uchicago.edu>

>> Subject: Re: Possible deadlock issue with asyn layer?

>> 

>> Hi Mark,

>> 

>> It’s been several years since I looked into this so I’m rusty, but I think if ASYN_CANBLOCK is set then the dbScanLock() should be released after the command to the driver has been placed on the port thread queue, so that database processing can continue while the asynManager port thread is handling the call to the driver and the device IO.

>> 

>> If ASYN_CANBLOCK is not set then the dbScanLock() is held while the driver is called. Then the read and the write could be interleaved so that this sequence of events happen:

>> 

>> Driver Thread:

>> 1) Driver reads new value from device

>> 2) Driver calls lock()

>> Database Thread:

>> 3) The output record is processed and dbScanLock() is called Driver Thread:

>> 4) Driver calls callParamCallbacks(), which results in a call to dbScanLock(), which blocks waiting for 3) to finish Database Thread:

>> 5) The output driver function calls lock(), which blocks waiting for 2) to finish.

>> 

>> I’ve seen similar deadlocks before when ASYN_CANBLOCK is not set. However, I’m not familiar with how asyn:READBACK is implemented so I could be missing something there. Did removing the use of asyn:READBACK change the behavior?

>> 

>> Cheers,

>> Matt

>> 

>> 

>>> On Apr 16, 2018, at 2:15 PM, Mark Davis <davism50@msu.edu> wrote:

>>> 

>>> NOTE:  Mark Rivers seems to have just about confirmed my assumptions, but since I wrote all this, I am sending it anyways in case it is of use to someone else in the future who stumbles across this discussion and is looking for a bit more detail on the issue that was discussed (and why it can happen).

>>> 

>>> Mark Davis

>>> --------------------------------------------------------------------

>>> -

>>> -

>>> -----

>>> 

>>> Hi Matt,

>>> 

>>> The ASYN_CANBLOCK flag is definitely NOT set (on purpose).

>>> 

>>> As I understand it, the purpose of that flag is to tell the asyn layer that calls to driver functions from the asyn layer (e.g. the writeXxx() functions) might block.  In general, this is used when the writeXxx() functions attempt to communicate with one or more devices directly, using some medium/interface that may cause the calling thread to block waiting for some I/O operations to complete.

>>> 

>>> When set, this causes the asyn layer to create a queue for write operations and a thread to service that queue (i.e. remove a value from the queue and call the appropriate writeXxx() function in the driver), so if one of the driver's writeXxx() functions does block, the only thread that is affected is the one servicing the queue for that driver.

>>> 

>>> Our driver's writeXxx() functions never do any I/O so they never block:  They simply update the value to be processed by the thread spawned by the driver's constructor.  And as with most (all?) calls to the driver functions, our driver's writeXxx() functions will never be called until the asyn layer has acquired the asyn lock for the driver instance (so as long as our driver always obtains the same lock before calling asyn layer functions, there are no concurrency issues).

>>> 

>>> The key assumption in my theory is when the calls to dbScanLock() occur.

>>> 

>>> For writes, I am assuming the following sequence (all done by a CAC-client thread in our case):

>>>      - write to the output record

>>>      - call dbScanLock(pRec)

>>>      - call pRec->process(pRec)

>>>      - call to the write func specified in the Device Support Entry Table (which, in this case, calls an asyn layer function to process the new setting)

>>>      - call function in the asyn layer (which calls it own lock() func to get the asyn-layer lock for a specific instance of the driver)

>>>      - call our driver's writeXxx() function

>>> 

>>> For reads:

>>>      - driver reads new values from device

>>>      - driver calls lock() to get asyn-layer lock for itself)

>>>      - driver calls setXxxParam() and/or setParamStatus() for each new parameter value

>>>      - driver calls callParamCallbacks(), which calls functions in

>>> asyn's the devEpics interface which calls dbScanLock(pRec) and posts

>>> the new value to the record

>>> 

>>> If all this is correct, then writes require getting the scan lock 1st and the asyn lock 2nd, while reads require getting the asyn lock 1st and the scan lock 2nd.  And telling the asyn layer that our driver might block wouldn't change the order the locks are acquired:  It would just create a queue and another thread to manage the calls to our driver's writeXxx() functions.

>>> 

>>> So far, I have found nothing that contradicts these assumptions, but I have not dug deep enough in to all the code to be sure all my assumptions are correct.

>>> 

>>> Mark

>>> 

>>> 

>>> 

>>> On 4/16/2018 11:57 AM, Pearson, Matthew R. wrote:

>>>> Hi,

>>>> 

>>>> Is it possible that the ASYN_CANBLOCK flag is not set when the asynPortDriver object is created?

>>>> 

>>>> Cheers,

>>>> Matt

>>>> 

>>>>> On Apr 16, 2018, at 11:32 AM, Mark Davis <davism50@msu.edu> wrote:

>>>>> 

>>>>> We have been experiencing a deadlock problem with one of our custom drivers based on the asynPortDriver class, and after reviewing the code and information gained via the gdb debugger, I have a theory regarding the cause that I would like others to review and tell me if there are any flaws they can spot (either in my logic or my assumptions).

>>>>> 

>>>>> 

>>>>> First, a summary of the driver that is exhibiting the problem:

>>>>> 

>>>>>      - Non-blocking driver based on asynPortDriver class

>>>>>      - One device per driver instance

>>>>>      - Communication with device handled by a thread launched by each instance of the driver

>>>>>      - All asyn calls by driver thread surrounded by lock()/unlock() calls

>>>>>      - info(asyn:READBACK, "1") tag used in some of the output records

>>>>>      - writeXxx() calls don't do any I/O:  They save the new value to be processed later

>>>>>          (and sometimes call setXxxParam(), setParamStatus(), and

>>>>> callParamCallbacks() functions)

>>>>> 

>>>>>      The driver thread runs periodically to update readings and process new settings as follows:

>>>>>         - reads values form the controller

>>>>>         - calls lock()

>>>>>         - calls setXxxParam() and setParamStatus() as needed

>>>>>         - calls callParamCallbacks() to update the records affected by the changes (acquiring the scan lock for each record while it is processed)

>>>>>         - calls unlock()

>>>>>         - Send new settings to the controller (using

>>>>> lock()/unlock() when calling any asyn functions or driver values

>>>>> that are touched by the writeXxx() functions)

>>>>> 

>>>>>      Concurrently with the driver threads usual processing, writes to output records occur.

>>>>>      The processing of each write (I assume) includes the following sequence of events:

>>>>>         - The thread that did the write (a CAC-client thread, in this case) gets the scan lock for the record and begins processing the record

>>>>>         - Processing of the record includes calls to asyn layer functions, which requires acquiring the asyn lock for the driver instance before calling one of its writeXxx() functions

>>>>>         - The asyn lock for the driver is released

>>>>>         - The scan lock for the record is released

>>>>> 

>>>>> What is happening:

>>>>> 

>>>>>      The driver thread sometimes blocks indefinitely waiting to

>>>>> acquire the scan lock for an output record it needs to update

>>>>> 

>>>>> 

>>>>> My theory as to what can cause this:

>>>>> 

>>>>>      What we know:

>>>>>         - The driver thread blocks indefinitely waiting for the scan lock for an output record that includes the info(asyn:READBACK, "1") tag

>>>>>           (see extract from gdb output below and the template for the record)

>>>>>         - The lockup of the driver thread is always preceded by a CA client write to an output record for one of the driver's parameters

>>>>>            (haven't yet confirmed it is the same record that the

>>>>> driver is waiting for when it hangs)

>>>>> 

>>>>>      What I believe is happening:

>>>>>         - Sometime after the driver calls lock() and when it blocks waiting for the scan lock for an output record, a CA client thread writes to the same record.  It successfully acquires the scan lock for the record, but then blocks waiting to acquire the asyn lock for the driver instance (which the driver thread already has).

>>>>> 

>>>>> 

>>>>> Obviously, if I am correct, the easiest way to avoid the problem is to eliminate the use of the info(asyn:READBACK, "1") tag (at least for any case where this could happen).  We don't actually need those tags for this database anymore, so that is something we will be trying shortly.

>>>>> 

>>>>> But can anyone point out a mistake in my reasoning or my assumptions?  IS this something we need to be aware of when using the info tag?

>>>>> 

>>>>> Mark Davis

>>>>> NSCL/FRIB Control Systems Software Engineer davism50@msu.edu

>>>>> 

>>>>> ========================== from gdb

>>>>> ===================================

>>>>> 

>>>>> #0  __lll_lock_wait () at

>>>>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

>>>>> 

>>>>> #1  0x00007f294a554494 in _L_lock_952 () from

>>>>> /lib/x86_64-linux-gnu/libpthread.so.0

>>>>> 

>>>>> #2  0x00007f294a5542f6 in __GI___pthread_mutex_lock

>>>>> (mutex=0x18db030) at ../nptl/pthread_mutex_lock.c:114

>>>>> 

>>>>> #3  0x00007f294adc17a3 in mutexLock (id=0x18db030) at

>>>>> ../../../src/libCom/osi/os/posix/osdMutex.c:46

>>>>> 

>>>>> #4  0x00007f294adc197f in epicsMutexOsdLock (pmutex=0x18db030)

>>>>>       at ../../../src/libCom/osi/os/posix/osdMutex.c:130

>>>>> 

>>>>> #5  0x00007f294adb85fb in epicsMutexLock (pmutexNode=0x18db070)

>>>>>       at ../../../src/libCom/osi/epicsMutex.cpp:143

>>>>> 

>>>>> #6  0x00007f294b019733 in dbScanLock (precord=0xe9d7d0) at

>>>>> ../../../src/ioc/db/dbLock.c:265

>>>>> 

>>>>> #7  0x00007f294bf8e18d in interruptCallbackOutput (drvPvt=0x1d08640, pasynUser=0x1d08f98, value=0)

>>>>>       at ../../asyn/devEpics/devAsynUInt32Digital.c:500

>>>>> 

>>>>> #8  0x00007f294bf74540 in paramList::uint32Callback (this=0xb3cd50, command=358, addr=0,

>>>>>       interruptMask=4294967295) at

>>>>> ../../asyn/asynPortDriver/asynPortDriver.cpp:628

>>>>> 

>>>>> #9  0x00007f294bf74adc in paramList::callCallbacks (this=0xb3cd50, addr=0)

>>>>>       at ../../asyn/asynPortDriver/asynPortDriver.cpp:750

>>>>> 

>>>>> #10 0x00007f294bf76417 in asynPortDriver::callParamCallbacks (this=0xb3ce60, list=0, addr=0)

>>>>>       at ../../asyn/asynPortDriver/asynPortDriver.cpp:1510

>>>>> 

>>>>> #11 0x00007f294bf763a7 in asynPortDriver::callParamCallbacks (this=0xb3ce60)

>>>>>       at ../../asyn/asynPortDriver/asynPortDriver.cpp:1496

>>>>> 

>>>>> ...

>>>>> __________________________________________________________________

>>>>> _ _ _____________ Jump to any frame in the callstack to examine

>>>>> arguments (frame 6, here):

>>>>> 

>>>>> (gdb) f 6

>>>>> #6  0x00007f294b019733 in dbScanLock (precord=0xe9d7d0) at ../../../src/ioc/db/dbLock.c:265

>>>>> 265     ../../../src/ioc/db/dbLock.c: No such file or directory.

>>>>> __________________________________________________________________

>>>>> _

>>>>> _

>>>>> _____________

>>>>> (gdb) p precord

>>>>> 

>>>>> $1 = (dbCommon *) 0xe9d7d0

>>>>> (gdb) p *(dbCommon *)precord

>>>>> 

>>>>> $2 = {name = "MPS_FPS:MSTR_N0001:Slave11_MaskBit42-Sel", '\000' <repeats 20 times>,

>>>>>     desc = '\000' <repeats 40 times>, asg = '\000' <repeats 28

>>>>> times>, scan = 0, pini = 0, phas = 0, <snipped>

>>>>> 

>>>>> __________________________________________________________________

>>>>> _

>>>>> _

>>>>> _____________

>>>>> 

>>>>> record(bo,

>>>>> "$(SYS)_$(SUB):$(DEV)_$(INST):Slave$(SLAVE)_MaskBit$(BIT)-Sel")

>>>>> {

>>>>>     field(DTYP, "asynUInt32Digital")

>>>>>     field(OUT,  "@asynMask($(ASYN_PORT),0,$(ASYN_MASK),$(TIMEOUT)) slave$(SLAVE)Mask$(MASK_REG)_Set")

>>>>>     field(ZNAM,"Mask")

>>>>>     field(ONAM,"Unmask")

>>>>>     info(asyn:READBACK, "1")

>>>>>     info(autosaveFields, "VAL")

>>>>> }

>>>>> 

>>>>> 

 

 


References:
Possible deadlock issue with asyn layer? Mark Davis
Re: Possible deadlock issue with asyn layer? Pearson, Matthew R.
Re: Possible deadlock issue with asyn layer? Mark Davis
Re: Possible deadlock issue with asyn layer? Pearson, Matthew R.
RE: Possible deadlock issue with asyn layer? Mark Rivers
Re: Possible deadlock issue with asyn layer? Davis, Mark
RE: Possible deadlock issue with asyn layer? Mark Rivers
Re: Possible deadlock issue with asyn layer? Davis, Mark
RE: Possible deadlock issue with asyn layer? Mark Rivers
Re: Possible deadlock issue with asyn layer? Mark Davis

Navigate by Date:
Prev: Re: Possible deadlock issue with asyn layer? Mark Davis
Next: CSS BOY enum local PV Gregory, Ray
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019 
Navigate by Thread:
Prev: Re: Possible deadlock issue with asyn layer? Mark Davis
Next: RE: Possible deadlock issue with asyn layer? Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019