EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Weird CAS hangup on IOC
From: Torsten Bögershausen via Core-talk <core-talk at aps.anl.gov>
To: Ralph Lange <ralph.lange at gmx.de>, EPICS Core Talk <core-talk at aps.anl.gov>
Date: Fri, 7 Feb 2020 11:20:54 +0100
Hm,

I think that the bind() and listen() need to come after each other,
in the same loop.

On way could be to equip rsrv_grab_tcp() with an extra parameter,
let let this function to the listen(), if needed.

I started to look around, couldn't find a place to set
need_to_listen to 0 or 1.
But can we try this code-snippet as a start of a discussion ?

/Torsten



diff --git a/modules/database/src/ioc/rsrv/caservertask.c b/modules/database/src/ioc/rsrv/caservertask.c
index 629b7b0b5..4e2381819 100644
--- a/modules/database/src/ioc/rsrv/caservertask.c
+++ b/modules/database/src/ioc/rsrv/caservertask.c
@@ -68,17 +68,6 @@ static void req_server (void *pParm)

     IOC_sock = conf->tcp;

-    /* listen and accept new connections */
-    if ( listen ( IOC_sock, 20 ) < 0 ) {
-        char sockErrBuf[64];
-        epicsSocketConvertErrnoToString (
-            sockErrBuf, sizeof ( sockErrBuf ) );
-        errlogPrintf ( "CAS: Listen error: %s\n",
-            sockErrBuf );
-        epicsSocketDestroy (IOC_sock);
-        epicsThreadSuspendSelf ();
-    }
-
     epicsEventSignal(castcp_startStopEvent);

     while (TRUE) {
@@ -158,7 +147,7 @@ int tryBind(SOCKET sock, const osiSockAddr* addr, const char *name)
  * to know this).
  */
 static
-SOCKET* rsrv_grab_tcp(unsigned short *port)
+SOCKET* rsrv_grab_tcp(unsigned short *port, int need_to_listen)
 {
     SOCKET *socks;
     osiSockAddr scratch;
@@ -198,7 +187,10 @@ SOCKET* rsrv_grab_tcp(unsigned short *port)

             epicsSocketEnableAddressReuseDuringTimeWaitState ( tcpsock );

-            if(bind(tcpsock, &scratch.sa, sizeof(scratch))==0) {
+            if((bind(tcpsock, &scratch.sa, sizeof(scratch))==0) &&
+               need_to_listen &&
+               listen (tcpsock) == 0)
+              {
                 if(scratch.ia.sin_port==0) {
                     /* use first socket to pick a random port */
                     osiSocklen_t alen = sizeof(ifaceAddr);
@@ -583,7 +575,7 @@ void rsrv_init (void)

     {
         unsigned short sport = ca_server_port;
-        socks = rsrv_grab_tcp(&sport);
+        socks = rsrv_grab_tcp(&sport, need_to_listen);

         if ( sport != ca_server_port ) {
             ca_server_port = sport;



On 07/02/20 09:35, Ralph Lange via Core-talk wrote:
I can confirm it is a race: Using a script that is starting two IOCs in parallel, we can see the effect happening about 1 out of 50 times.

When rsrv starts, there is a window between checking and getting exclusive access so that further checks fail. As Torsten pointed out, only active listening sockets prevent another bind() with SO_REUSEADDR set on the sockets. From the socket(7) manpage:
*SO_REUSEADDR*
    Indicates that the rules used in validating addresses supplied in a
    *bind <https://linux.die.net/man/2/bind>*(2) call should allow reuse
    of local addresses. For *AF_INET* sockets this means that a socket
    may bind, except when there is an active listening socket bound to
the address. If the second IOC calls bind() before the first IOC called listen(), the second bind() will succeed and the second IOC will fail later when it calls listen(). Currently it decides to go deaf (suspend the receiving thread) at that point, but it really should go back to the phase of testing bind() instead.

I'll create a LP bug for this.

Cheers,
~Ralph


Replies:
Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
References:
Weird CAS hangup on IOC Ralph Lange via Core-talk
Re: Weird CAS hangup on IOC Michael Davidsaver via Core-talk
Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
Re: Weird CAS hangup on IOC Michael Davidsaver via Core-talk
Re: Weird CAS hangup on IOC Ralph Lange via Core-talk

Navigate by Date:
Prev: Jenkins build is back to normal : EPICS-3.16 #177 Jenkins EPICS PSI via Core-talk
Next: [Bug 1862328] [NEW] Race condition on IOC start leaves rsrv unresponsive Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
Next: Re: Weird CAS hangup on IOC Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 07 Feb 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·