EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: RE: [External] Trouble with pva:// on Phoebus with often-restarted IOCs
From: "Leblanc, Gregory via Tech-talk" <tech-talk at aps.anl.gov>
To: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Tue, 22 Aug 2023 21:02:32 +0000
> -----Original Message-----
> From: Torsten Bögershausen <torsten.bogershausen at ess.eu>
> Sent: Monday, August 21, 2023 12:20 PM
> To: Leblanc, Gregory <leblanc at ohio.edu>
> Cc: tech-talk at aps.anl.gov
> Subject: Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs
> 
> Hej Greg,
>  … Can I force it to listen on 5076…
> Typically not.
> The TCP/IP stack is thinking that 5076 is “not allowed” (please forgive my
> wording, network experts) There may be different reasons:
> The port is "in use”, and that’s the end of the story.
> the lsof command should report it.

That's what I was assuming.  For some reason it thinks something is already listening on 5076, so it can't grab that port.  I thought maybe it could be configured to quit with an error instead of trying a dynamic port, but looking at blockingTCPAcceptor.cpp, I don't see anything in there that would DWIW.

> Or, it has been in use, and now it is in the “time wait” state.
> This state should be released after typically 120 seconds or so.
> So please be aware that things may change while you are debugging.
> What does
> netstat -a -n -t | grep 5076
> give you ?

Hmm, I didn't realize that the timeout was as long as 2 minutes.  netstat -a -n -t | grep 5076 always returns nothing.  netstat -a -n -t | grep 5075 returns the following when my IOC is listening on a non-random port:
tcp        0      0 0.0.0.0:5075            0.0.0.0:*               LISTEN

netstat -lut |grep 5076 returns the following under the same circumstances (but returns nothing on 5075):
udp        0      0 224.0.0.128:5076        0.0.0.0:*
udp        0      0 10.0.255.255:5076       0.0.0.0:*
udp        0      0 epics2:5076             0.0.0.0:*
udp        0      0 224.0.0.128:5076        0.0.0.0:*
udp        0      0 10.0.255.255:5076       0.0.0.0:*
udp        0      0 epics2:5076             0.0.0.0:* 

I also used lsof -I :507x and got the following:
$ lsof -i :5076
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
st.cmd  76083  eal    8u  IPv4 372679      0t0  UDP epics2:5076
st.cmd  76083  eal    9u  IPv4 372680      0t0  UDP 10.0.255.255:5076
st.cmd  76083  eal   10u  IPv4 372681      0t0  UDP 224.0.0.128:5076
st.cmd  76083  eal   17u  IPv4 372689      0t0  UDP epics2:5076
st.cmd  76083  eal   18u  IPv4 372690      0t0  UDP 10.0.255.255:5076
st.cmd  76083  eal   19u  IPv4 372691      0t0  UDP 224.0.0.128:5076
$ lsof -i :5075
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
st.cmd  76083  eal   14u  IPv4 372687      0t0  TCP *:pvaccess (LISTEN)


From some further reading, it looks like setting SO_REUSEPORT will make the problem disappear, at least on linux kernels newer than 3.9, and the BSDs.  Looks like a tangled mess of portability garbage between that and SO_REUSEADDR to make it work on windows as well, and I've got no idea what RTEMS and VxWorks might implement.  For me it's not worth barking up that tree.  I think I've at least temporarily fixed it by doing `sudo sysctl -w net.ipv4.tcp_tw_reuse=1` and ` sudo sysctl -w net.ipv4.tcp_fin_timeout=1` which I think sets the timeout to 1 second.  Although this solution doesn't seem to be very robust, if I can fix it by waiting 2 minutes, I can probably live with that as well.  
     Greg

--
Gregory Leblanc
Accelerator Engineer
Edwards Accelerator Lab - Ohio University
123 University Terrace
Athens, OH 45701 USA
leblanc at ohio.edu
M: (401) 52-OUAL1 or (401) 526-8251





References:
Trouble with pva:// on Phoebus with often-restarted IOCs Leblanc, Gregory via Tech-talk
RE: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Leblanc, Gregory via Tech-talk
Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Torsten Bögershausen via Tech-talk

Navigate by Date:
Prev: Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Torsten Bögershausen via Tech-talk
Next: Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Torsten Bögershausen via Tech-talk
Next: Re: [External] Trouble with pva:// on Phoebus with often-restarted IOCs Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
ANJ, 23 Aug 2023 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·