EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: [Bug 1868486] Re: epicsMessageQueue lost messages
From: rivers via Core-talk <core-talk at aps.anl.gov>
To: core-talk at aps.anl.gov
Date: Fri, 27 Mar 2020 13:07:07 -0000
> Does Ben's change also eliminate the spurious timeouts?

Yes, I believe so.  I know only see single timeouts, never 2 in a row.
These single timeouts are normal because for this test the sleep time of
the sender and the receive timeout are both 0.01 seconds, so there is
some probably of a real timeout.  When receive size is -1 now (timeout)
the value is the previous value, it is never the next value.

2020/03/27 08:03:49.266     sent value=964
2020/03/27 08:03:49.266 received value=964
2020/03/27 08:03:49.276     sent value=965
2020/03/27 08:03:49.277 received value=965
2020/03/27 08:03:49.287     sent value=966
2020/03/27 08:03:49.287 received value=966
2020/03/27 08:03:49.297     sent value=967
2020/03/27 08:03:49.297 received value=967
2020/03/27 08:03:49.307 receive size=-1 (value=967)
2020/03/27 08:03:49.307 received value=968
2020/03/27 08:03:49.307     sent value=968
2020/03/27 08:03:49.317 receive size=-1 (value=968)
2020/03/27 08:03:49.317     sent value=969
2020/03/27 08:03:49.317 received value=969
2020/03/27 08:03:49.327     sent value=970
2020/03/27 08:03:49.327 received value=970
2020/03/27 08:03:49.337 receive size=-1 (value=970)
2020/03/27 08:03:49.337 received value=971
2020/03/27 08:03:49.337     sent value=971
2020/03/27 08:03:49.347 receive size=-1 (value=971)
2020/03/27 08:03:49.347     sent value=972
2020/03/27 08:03:49.347 received value=972
2020/03/27 08:03:49.357 receive size=-1 (value=972)
2020/03/27 08:03:49.357 received value=973
2020/03/27 08:03:49.357     sent value=973
2020/03/27 08:03:49.367 receive size=-1 (value=973)
2020/03/27 08:03:49.368     sent value=974
2020/03/27 08:03:49.368 received value=974
2020/03/27 08:03:49.378     sent value=975
2020/03/27 08:03:49.378 received value=975
2020/03/27 08:03:49.388     sent value=976
2020/03/27 08:03:49.388 received value=976
2020/03/27 08:03:49.398     sent value=977
2020/03/27 08:03:49.398 received value=977
2020/03/27 08:03:49.408     sent value=978
2020/03/27 08:03:49.408 received value=978
2020/03/27 08:03:49.418     sent value=979
2020/03/27 08:03:49.418 received value=979
2020/03/27 08:03:49.428     sent value=980
2020/03/27 08:03:49.428 received value=980
2020/03/27 08:03:49.428 receive size=-1 (value=980)
2020/03/27 08:03:49.438     sent value=981
2020/03/27 08:03:49.438 received value=981
2020/03/27 08:03:49.448     sent value=982
2020/03/27 08:03:49.448 received value=982

-- 
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1868486

Title:
  epicsMessageQueue lost messages

Status in EPICS Base:
  Confirmed
Status in EPICS Base 3.15 series:
  New
Status in EPICS Base 7.0 series:
  Confirmed

Bug description:
  https://epics.anl.gov/core-talk/2020/msg00396.php

  Mark Rivers observed epicsMessageQueue losing messages.

  https://epics.anl.gov/core-talk/2020/msg00408.php

  > I think I see the logic error in how the eventSent flag is handled,
  > specifically related to the fact that epicsEvent is a semaphore as
  > opposed to a condition variable.
  > 
  > This allows a "race" to occur if the first/only waiting receiver
  > times out, and epicsEventWaitWithTimeout() returns, while a sender
  > is in epicsMessageQueueSend() preparing to wake up a receiver.
  > 
  > This results in a situation where the sender has set the eventSent,
  > and indeed copied a message to the buffer of, a thread which has
  > decided to abort.
  > 
  > After timing out, the receiver sees the timeout and returns -1
  > even through eventSent has been set.  This can be trapped with:
  > 
  > > b osdMessageQueue.cpp:358 if status!=0 && threadNode.eventSent
  > 
  > So here is your lost message.
  > 
  > Now when epicsMessageQueueReceiveWithTimeout() is called again, no message
  > is waiting in the queue, so epicsEventWaitWithTimeout() is called.
  > Since the semaphore is already set, this returns immediately with status==0,
  > but this is a spurious wakeup and the eventSent flag is not set.
  > 
  > And here is the second "timeout".

  Line numbers circa 7.0.3.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1868486/+subscriptions

References:
[Bug 1868486] [NEW] epicsMessageQueue lost messages mdavidsaver via Core-talk

Navigate by Date:
Prev: Build completed: EPICS Base base-3.15-570 AppVeyor via Core-talk
Next: [Merge] ~bfrk/epics-base:scalar-get-optimization into epics-base:7.0 Ben Franksen via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: [Bug 1868486] Re: epicsMessageQueue lost messages rivers via Core-talk
Next: [Bug 1868486] Re: epicsMessageQueue lost messages mdavidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 12 Apr 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·