Fix prepared during the 2020 Codeathon by Bryan Tester (thanks!),
committed as 4844fbb to the 3.15 branch.
** Changed in: epics-base
Milestone: None => 3.15.8
** Also affects: epics-base/3.15
Importance: Undecided
Status: New
** Also affects: epics-base/7.0
Importance: Medium
Status: New
** Changed in: epics-base/7.0
Milestone: 3.15.8 => None
** Changed in: epics-base/3.15
Milestone: None => 3.15.8
** Changed in: epics-base/3.15
Importance: Undecided => Medium
** Changed in: epics-base/3.15
Status: New => Fix Committed
--
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1862328
Title:
Race condition on IOC start leaves rsrv unresponsive
Status in EPICS Base:
New
Status in EPICS Base 3.15 series:
Fix Committed
Status in EPICS Base 7.0 series:
New
Bug description:
We have been seeing an IOC lately that occasionally seems to boot
fine, does not print the usual "cas warning: Configured TCP port was
unavailable. [...]" messages for not being the first one on the host,
but then emits a
CAS: Listen error: Address already in use
Thread CAS-TCP (0x3337a20) suspended
and becomes CA unresponsive.
This is a race condition: Using a script that is starting two IOCs in
parallel, we can see the effect happening about 1 out of 50 times.
When rsrv starts, there is a window between checking and getting
exclusive access so that further checks fail. It turns out only active
listening sockets prevent another bind() with SO_REUSEADDR set on the
sockets. From the socket(7) manpage:
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied in a bind(2) call should allow reuse of local addresses. For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address.
If the second IOC calls bind() before the first IOC called listen(),
the second bind() will succeed and the second IOC will fail later when
it calls listen(). Currently it decides to go deaf (suspend the
receiving thread) at that point, but it really should go back to the
phase of testing bind() instead.
See also https://epics.anl.gov/core-talk/2020/msg00110.php
To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1862328/+subscriptions
- References:
- [Bug 1862328] [NEW] Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk
- Navigate by Date:
- Prev:
Re: [Merge] ~dirk.zimoch/epics-base:InitEventFreelistsEarly into epics-base:3.15 Dirk Zimoch via Core-talk
- Next:
Static code analysis using Codacy - Submodules now enabled Karl Vestin via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
[Bug 1862328] [NEW] Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk
- Next:
[Bug 1862328] Re: Race condition on IOC start leaves rsrv unresponsive Andrew Johnson via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
|