Hi Andrew,
Thanks for your comment, I only just saw it. For some reason when other
people add comments here (e.g. Michael) I receive an e-mail about it.
But when you write a comment I do not receive an e-mail. I have seen
the same thing previously. Any idea why?
> Do you have the ability to run code on VxWorks at the moment?
Yes, I can test on vxWorks.
I just came up with a very simple patch that almost fixes the problem.
diff --git a/modules/libcom/src/osi/os/default/osdMessageQueue.cpp b/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
index c86d8cc..e0d10be 100644
--- a/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
+++ b/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
@@ -339,8 +339,10 @@ myReceive(epicsMessageQueueId pmsg, void *message, unsigned int size,
epicsMutexUnlock(pmsg->mutex);
epicsEventStatus status;
- if (timeout > 0)
+ if (timeout > 0) {
status = epicsEventWaitWithTimeout(threadNode.evp->event, timeout);
+ if (status != epicsEventWaitOK) status = epicsEventTryWait(threadNode.evp->event);
+ }
else
status = epicsEventWait(threadNode.evp->event);
This allows for the fact that there is a very brief window when the
timeout can occur but an event has actually been signaled. It just
calls epicsEventTryWait to detect this case.
Here is the output of running the attached test program 10,000 loops, delay time=0.01 before the patch.
Loops=10000, numSent=10000, numReceived=9770
So it failed 2.3% of the time.
Here is the output of running the test program 10,000 loops after the patch:
Loops=10000, numSent=10000, numReceived=9999
So it failed 0.01% of the time, or 230 times less than before the patch.
But unfortunately it did fail, so it is not a solution.
** Attachment added: "Test program"
https://bugs.launchpad.net/epics-base/+bug/1868486/+attachment/5341691/+files/testMessageQueue.c
--
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1868486
Title:
epicsMessageQueue lost messages
Status in EPICS Base:
Confirmed
Status in EPICS Base 3.15 series:
New
Status in EPICS Base 7.0 series:
Confirmed
Bug description:
https://epics.anl.gov/core-talk/2020/msg00396.php
Mark Rivers observed epicsMessageQueue losing messages.
https://epics.anl.gov/core-talk/2020/msg00408.php
> I think I see the logic error in how the eventSent flag is handled,
> specifically related to the fact that epicsEvent is a semaphore as
> opposed to a condition variable.
>
> This allows a "race" to occur if the first/only waiting receiver
> times out, and epicsEventWaitWithTimeout() returns, while a sender
> is in epicsMessageQueueSend() preparing to wake up a receiver.
>
> This results in a situation where the sender has set the eventSent,
> and indeed copied a message to the buffer of, a thread which has
> decided to abort.
>
> After timing out, the receiver sees the timeout and returns -1
> even through eventSent has been set. This can be trapped with:
>
> > b osdMessageQueue.cpp:358 if status!=0 && threadNode.eventSent
>
> So here is your lost message.
>
> Now when epicsMessageQueueReceiveWithTimeout() is called again, no message
> is waiting in the queue, so epicsEventWaitWithTimeout() is called.
> Since the semaphore is already set, this returns immediately with status==0,
> but this is a spurious wakeup and the eventSent flag is not set.
>
> And here is the second "timeout".
Line numbers circa 7.0.3.1
To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1868486/+subscriptions
- References:
- [Bug 1868486] [NEW] epicsMessageQueue lost messages mdavidsaver via Core-talk
- Navigate by Date:
- Prev:
Re: [Merge] ~dirk.zimoch/epics-base:dbChannelForDBLinks into epics-base:7.0 Ben Franksen via Core-talk
- Next:
How to work with EPICS repo document Timo Korhonen via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
2025
- Navigate by Thread:
- Prev:
[Bug 1868486] Re: epicsMessageQueue lost messages mdavidsaver via Core-talk
- Next:
[Bug 1868486] Re: epicsMessageQueue lost messages rivers via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
2025
|