EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Bug in epicsMessageQueue?
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: "'core-talk at aps.anl.gov'" <core-talk at aps.anl.gov>
Date: Fri, 20 Mar 2020 19:37:19 -0700
On 3/20/20 7:01 PM, Michael Davidsaver wrote:
> On 3/20/20 5:00 PM, Mark Rivers wrote:
>> Here is the code:
>>
>>  
>>
>> if (threadNode.eventSent && (threadNode.size <= size) &&
>>
>>         status == epicsEventOK)
>>
>>         return threadNode.size;
>>
>>     return -1;
>>
>>  
>>
>> So it can return -1 if the allocated size for the message is wrong, but I believe that is impossible in this case.
>>
>>  
>>
>> It can fail if status is not epicsEventOK, which means epicsEventWaitWithTimeout returned something other than epicsEventOK, i.e. timeout.
>>
>>  
>>
>> I am not sure when threadNode.evenSent would be null.
> 
> It looks like door number three.  I'm not sure how the eventSent flag is meant to be used,
> but your events are being lost when the epicsEvent is signaled and it is not set.
> 
>> $ gdb --args ./testMessageQueue 200 0.1
>> (gdb) b osdMessageQueue.cpp:358 if status==0
>> (gdb) r
> ... wait until breakpoint hit
>> (gdb) p status
>> $1 = epicsEventOK
>> (gdb) p threadNode.size <= size
>> $2 = true
>> (gdb) p threadNode.eventSent
>> $3 = false

I think I see the logic error in how the eventSent flag is handled,
specifically related to the fact that epicsEvent is a semaphore as
opposed to a condition variable.

This allows a "race" to occur if the first/only waiting receiver
times out, and epicsEventWaitWithTimeout() returns, while a sender
is in epicsMessageQueueSend() preparing to wake up a receiver.

This results in a situation where the sender has set the eventSent,
and indeed copied a message to the buffer of, a thread which has
decided to abort.

After timing out, the receiver sees the timeout and returns -1
even through eventSent has been set.  This can be trapped with:

> b osdMessageQueue.cpp:358 if status!=0 && threadNode.eventSent

So here is your lost message.

Now when epicsMessageQueueReceiveWithTimeout() is called again, no message
is waiting in the queue, so epicsEventWaitWithTimeout() is called.
Since the semaphore is already set, this returns immediately with status==0,
but this is a spurious wakeup and the eventSent flag is not set.

And here is the second "timeout".


There hasn't been any significant change to this logic since it was added
in 2003, so I think this issue has been present since day 0.

References:
Bug in epicsMessageQueue? Mark Rivers via Core-talk
Re: Bug in epicsMessageQueue? Michael Davidsaver via Core-talk
RE: Bug in epicsMessageQueue? Mark Rivers via Core-talk
Re: Bug in epicsMessageQueue? Michael Davidsaver via Core-talk

Navigate by Date:
Prev: RE: C++ question Mark Rivers via Core-talk
Next: Build completed: EPICS Base base-3.15-562 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Bug in epicsMessageQueue? Michael Davidsaver via Core-talk
Next: Build completed: EPICS Base base-3.15-562 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 20 Mar 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·