EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: [Bug 1862328] Re: Race condition on IOC start leaves rsrv unresponsive
From: Andrew Johnson via Core-talk <core-talk at aps.anl.gov>
To: core-talk at aps.anl.gov
Date: Fri, 15 May 2020 19:07:04 -0000
** Changed in: epics-base/3.15
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1862328

Title:
  Race condition on IOC start leaves rsrv unresponsive

Status in EPICS Base:
  Fix Committed
Status in EPICS Base 3.15 series:
  Fix Released
Status in EPICS Base 7.0 series:
  Fix Committed

Bug description:
  We have been seeing an IOC lately that occasionally seems to boot
  fine, does not print the usual "cas warning: Configured TCP port was
  unavailable. [...]" messages for not being the first one on the host,
  but then emits a

  CAS: Listen error: Address already in use
  Thread CAS-TCP (0x3337a20) suspended

  and becomes CA unresponsive.

  This is a race condition: Using a script that is starting two IOCs in
  parallel, we can see the effect happening about 1 out of 50 times.

  When rsrv starts, there is a window between checking and getting
  exclusive access so that further checks fail. It turns out only active
  listening sockets prevent another bind() with SO_REUSEADDR set on the
  sockets. From the socket(7) manpage:

  SO_REUSEADDR
  Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. 

  If the second IOC calls bind() before the first IOC called listen(),
  the second bind() will succeed and the second IOC will fail later when
  it calls listen(). Currently it decides to go deaf (suspend the
  receiving thread) at that point, but it really should go back to the
  phase of testing bind() instead.

  See also https://epics.anl.gov/core-talk/2020/msg00110.php

To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1862328/+subscriptions

References:
[Bug 1862328] [NEW] Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk

Navigate by Date:
Prev: [Bug 1829770] Re: event record device support broken with constant INP Andrew Johnson via Core-talk
Next: [Bug 1829919] Re: IOC segfaults when calling dbLoadRecords after iocInit Andrew Johnson via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: [Bug 1862328] Re: Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk
Next: [Bug 1862328] Re: Race condition on IOC start leaves rsrv unresponsive Andrew Johnson via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 28 May 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·