> Does Ben's change also eliminate the spurious timeouts?
Yes, I believe so. I know only see single timeouts, never 2 in a row.
These single timeouts are normal because for this test the sleep time of
the sender and the receive timeout are both 0.01 seconds, so there is
some probably of a real timeout. When receive size is -1 now (timeout)
the value is the previous value, it is never the next value.
2020/03/27 08:03:49.266 sent value=964
2020/03/27 08:03:49.266 received value=964
2020/03/27 08:03:49.276 sent value=965
2020/03/27 08:03:49.277 received value=965
2020/03/27 08:03:49.287 sent value=966
2020/03/27 08:03:49.287 received value=966
2020/03/27 08:03:49.297 sent value=967
2020/03/27 08:03:49.297 received value=967
2020/03/27 08:03:49.307 receive size=-1 (value=967)
2020/03/27 08:03:49.307 received value=968
2020/03/27 08:03:49.307 sent value=968
2020/03/27 08:03:49.317 receive size=-1 (value=968)
2020/03/27 08:03:49.317 sent value=969
2020/03/27 08:03:49.317 received value=969
2020/03/27 08:03:49.327 sent value=970
2020/03/27 08:03:49.327 received value=970
2020/03/27 08:03:49.337 receive size=-1 (value=970)
2020/03/27 08:03:49.337 received value=971
2020/03/27 08:03:49.337 sent value=971
2020/03/27 08:03:49.347 receive size=-1 (value=971)
2020/03/27 08:03:49.347 sent value=972
2020/03/27 08:03:49.347 received value=972
2020/03/27 08:03:49.357 receive size=-1 (value=972)
2020/03/27 08:03:49.357 received value=973
2020/03/27 08:03:49.357 sent value=973
2020/03/27 08:03:49.367 receive size=-1 (value=973)
2020/03/27 08:03:49.368 sent value=974
2020/03/27 08:03:49.368 received value=974
2020/03/27 08:03:49.378 sent value=975
2020/03/27 08:03:49.378 received value=975
2020/03/27 08:03:49.388 sent value=976
2020/03/27 08:03:49.388 received value=976
2020/03/27 08:03:49.398 sent value=977
2020/03/27 08:03:49.398 received value=977
2020/03/27 08:03:49.408 sent value=978
2020/03/27 08:03:49.408 received value=978
2020/03/27 08:03:49.418 sent value=979
2020/03/27 08:03:49.418 received value=979
2020/03/27 08:03:49.428 sent value=980
2020/03/27 08:03:49.428 received value=980
2020/03/27 08:03:49.428 receive size=-1 (value=980)
2020/03/27 08:03:49.438 sent value=981
2020/03/27 08:03:49.438 received value=981
2020/03/27 08:03:49.448 sent value=982
2020/03/27 08:03:49.448 received value=982
--
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1868486
Title:
epicsMessageQueue lost messages
Status in EPICS Base:
Confirmed
Status in EPICS Base 3.15 series:
New
Status in EPICS Base 7.0 series:
Confirmed
Bug description:
https://epics.anl.gov/core-talk/2020/msg00396.php
Mark Rivers observed epicsMessageQueue losing messages.
https://epics.anl.gov/core-talk/2020/msg00408.php
> I think I see the logic error in how the eventSent flag is handled,
> specifically related to the fact that epicsEvent is a semaphore as
> opposed to a condition variable.
>
> This allows a "race" to occur if the first/only waiting receiver
> times out, and epicsEventWaitWithTimeout() returns, while a sender
> is in epicsMessageQueueSend() preparing to wake up a receiver.
>
> This results in a situation where the sender has set the eventSent,
> and indeed copied a message to the buffer of, a thread which has
> decided to abort.
>
> After timing out, the receiver sees the timeout and returns -1
> even through eventSent has been set. This can be trapped with:
>
> > b osdMessageQueue.cpp:358 if status!=0 && threadNode.eventSent
>
> So here is your lost message.
>
> Now when epicsMessageQueueReceiveWithTimeout() is called again, no message
> is waiting in the queue, so epicsEventWaitWithTimeout() is called.
> Since the semaphore is already set, this returns immediately with status==0,
> but this is a spurious wakeup and the eventSent flag is not set.
>
> And here is the second "timeout".
Line numbers circa 7.0.3.1
To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1868486/+subscriptions
- References:
- [Bug 1868486] [NEW] epicsMessageQueue lost messages mdavidsaver via Core-talk
- Navigate by Date:
- Prev:
Build completed: EPICS Base base-3.15-570 AppVeyor via Core-talk
- Next:
[Merge] ~bfrk/epics-base:scalar-get-optimization into epics-base:7.0 Ben Franksen via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
[Bug 1868486] Re: epicsMessageQueue lost messages rivers via Core-talk
- Next:
[Bug 1868486] Re: epicsMessageQueue lost messages mdavidsaver via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
|