Experimental Physics and Industrial Control System
Hm,
I think that the bind() and listen() need to come after each other,
in the same loop.
On way could be to equip rsrv_grab_tcp() with an extra parameter,
let let this function to the listen(), if needed.
I started to look around, couldn't find a place to set
need_to_listen to 0 or 1.
But can we try this code-snippet as a start of a discussion ?
/Torsten
diff --git a/modules/database/src/ioc/rsrv/caservertask.c
b/modules/database/src/ioc/rsrv/caservertask.c
index 629b7b0b5..4e2381819 100644
--- a/modules/database/src/ioc/rsrv/caservertask.c
+++ b/modules/database/src/ioc/rsrv/caservertask.c
@@ -68,17 +68,6 @@ static void req_server (void *pParm)
IOC_sock = conf->tcp;
- /* listen and accept new connections */
- if ( listen ( IOC_sock, 20 ) < 0 ) {
- char sockErrBuf[64];
- epicsSocketConvertErrnoToString (
- sockErrBuf, sizeof ( sockErrBuf ) );
- errlogPrintf ( "CAS: Listen error: %s\n",
- sockErrBuf );
- epicsSocketDestroy (IOC_sock);
- epicsThreadSuspendSelf ();
- }
-
epicsEventSignal(castcp_startStopEvent);
while (TRUE) {
@@ -158,7 +147,7 @@ int tryBind(SOCKET sock, const osiSockAddr* addr,
const char *name)
* to know this).
*/
static
-SOCKET* rsrv_grab_tcp(unsigned short *port)
+SOCKET* rsrv_grab_tcp(unsigned short *port, int need_to_listen)
{
SOCKET *socks;
osiSockAddr scratch;
@@ -198,7 +187,10 @@ SOCKET* rsrv_grab_tcp(unsigned short *port)
epicsSocketEnableAddressReuseDuringTimeWaitState ( tcpsock );
- if(bind(tcpsock, &scratch.sa, sizeof(scratch))==0) {
+ if((bind(tcpsock, &scratch.sa, sizeof(scratch))==0) &&
+ need_to_listen &&
+ listen (tcpsock) == 0)
+ {
if(scratch.ia.sin_port==0) {
/* use first socket to pick a random port */
osiSocklen_t alen = sizeof(ifaceAddr);
@@ -583,7 +575,7 @@ void rsrv_init (void)
{
unsigned short sport = ca_server_port;
- socks = rsrv_grab_tcp(&sport);
+ socks = rsrv_grab_tcp(&sport, need_to_listen);
if ( sport != ca_server_port ) {
ca_server_port = sport;
On 07/02/20 09:35, Ralph Lange via Core-talk wrote:
I can confirm it is a race: Using a script that is starting two IOCs in
parallel, we can see the effect happening about 1 out of 50 times.
When rsrv starts, there is a window between checking and getting
exclusive access so that further checks fail.
As Torsten pointed out, only active listening sockets prevent another
bind() with SO_REUSEADDR set on the sockets. From the socket(7) manpage:
*SO_REUSEADDR*
Indicates that the rules used in validating addresses supplied in a
*bind <https://linux.die.net/man/2/bind>*(2) call should allow reuse
of local addresses. For *AF_INET* sockets this means that a socket
may bind, except when there is an active listening socket bound to
the address.
If the second IOC calls bind() before the first IOC called listen(), the
second bind() will succeed and the second IOC will fail later when it
calls listen(). Currently it decides to go deaf (suspend the
receiving thread) at that point, but it really should go back to the
phase of testing bind() instead.
I'll create a LP bug for this.
Cheers,
~Ralph
- Replies:
- Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
- References:
- Weird CAS hangup on IOC Ralph Lange via Core-talk
- Re: Weird CAS hangup on IOC Michael Davidsaver via Core-talk
- Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
- Re: Weird CAS hangup on IOC Michael Davidsaver via Core-talk
- Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
- Navigate by Date:
- Prev:
Jenkins build is back to normal : EPICS-3.16 #177 Jenkins EPICS PSI via Core-talk
- Next:
[Bug 1862328] [NEW] Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
- Next:
Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024