EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: [Bug 1868486] Re: epicsMessageQueue lost messages
From: mdavidsaver via Core-talk <core-talk at aps.anl.gov>
To: core-talk at aps.anl.gov
Date: Fri, 17 Apr 2020 02:01:15 -0000
> Ben's patch is in message #8 in this bug.

For some reason I remembered it being larger.

This patch is not a complete solution for the RX side issue as it still
allows triggered/signaled epicsEvents to be returned to the free-list.
So the spurious timeouts will continue.  It doesn't address the TX
side issues at all.  @Andrew, the TX side has the same design, and thus
the same problems.

Still, this patch is a definite improvement, and I think worth applying.

-- 
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1868486

Title:
  epicsMessageQueue lost messages

Status in EPICS Base:
  In Progress
Status in EPICS Base 3.15 series:
  Fix Committed
Status in EPICS Base 7.0 series:
  In Progress

Bug description:
  https://epics.anl.gov/core-talk/2020/msg00396.php

  Mark Rivers observed epicsMessageQueue losing messages.

  https://epics.anl.gov/core-talk/2020/msg00408.php

  > I think I see the logic error in how the eventSent flag is handled,
  > specifically related to the fact that epicsEvent is a semaphore as
  > opposed to a condition variable.
  > 
  > This allows a "race" to occur if the first/only waiting receiver
  > times out, and epicsEventWaitWithTimeout() returns, while a sender
  > is in epicsMessageQueueSend() preparing to wake up a receiver.
  > 
  > This results in a situation where the sender has set the eventSent,
  > and indeed copied a message to the buffer of, a thread which has
  > decided to abort.
  > 
  > After timing out, the receiver sees the timeout and returns -1
  > even through eventSent has been set.  This can be trapped with:
  > 
  > > b osdMessageQueue.cpp:358 if status!=0 && threadNode.eventSent
  > 
  > So here is your lost message.
  > 
  > Now when epicsMessageQueueReceiveWithTimeout() is called again, no message
  > is waiting in the queue, so epicsEventWaitWithTimeout() is called.
  > Since the semaphore is already set, this returns immediately with status==0,
  > but this is a spurious wakeup and the eventSent flag is not set.
  > 
  > And here is the second "timeout".

  Line numbers circa 7.0.3.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1868486/+subscriptions

References:
[Bug 1868486] [NEW] epicsMessageQueue lost messages mdavidsaver via Core-talk

Navigate by Date:
Prev: [Bug 1861612] Re: callbackRequestDelay not waiting for 1/60 sec on vxWorks mdavidsaver via Core-talk
Next: Build failed: epics-base base-fix-epicsFindSymbol-466 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: [Bug 1868486] Re: epicsMessageQueue lost messages rivers via Core-talk
Next: [Bug 1868486] Re: epicsMessageQueue lost messages Andrew Johnson via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 14 May 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·