EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Possible problem with base 7.0.4 on Windows (regression)
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: "Rivers, Mark L." <rivers at cars.uchicago.edu>
Cc: EPICS core-talk <core-talk at aps.anl.gov>
Date: Thu, 18 Jun 2020 23:04:22 -0700
Hi Mark,

I suppose it's fitting for you to report this regression which
has its roots in my attempt to fix another issue you reported.

https://epics.anl.gov/core-talk/2020/msg00050.php

db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6 was one of a series
of commits effecting this code and can't simply be reverted
in full.  Copying in the original WIN32/osdSockAddrReuse.cpp
should do as a quick workaround.

https://github.com/epics-base/epics-base/blob/5064931aa6e54481832951b4f27a982c5003233d/modules/libcom/src/osi/os/WIN32/osdSockAddrReuse.cpp


The changes which cause this were made during the Feb. codeathon
at DLS.  At the time I added a new unittest to osiSockTest which
checks the behavior of bind(), but doesn't actually send/receive
any traffic.

I would have liked to test this as well.  The problem is that I
don't know of a portable way to send broadcast traffic through
the loopback.  Using other interfaces would be a violation of
the usual practice of unittest isolation, as other hosts could
see this traffic.

Right now, I'm inclined towards doing this though.  With a test
which sends legitimate CA search broadcasts on 5064 through all
interfaces with a random/invalid PV name.

I think the risk of adverse external effects would be minimal,
and the benefit of avoiding future regressions sufficient
to justify it.


On 6/18/20 8:37 PM, Johnson, Andrew N. via Core-talk wrote:
> Hi Mark,
> 
> On Jun 18, 2020, at 9:39 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
>>
>> I am writing to report a possible problem with base 7.0.4 on Windows.
>>
>> Today I updated EPICS on a Windows machine from base 7.0.3.1 to 7.0.4.  This machine runs 3 EPICS IOCs; 2 are areaDetector camera IOCs and the third runs various serial and TCP devices.
>>
>> The behavior I observed was strange.
>>
>> - The first 2 IOCs to start worked fine, I could connect to them from Channel Access clients.
>>
>> - The third IOC started fine, and dbl showed all the PVs.  However, I could not connect to them from any Channel Access client (medm ,caget, etc.)
>>
>> If I changed the order in which I started the IOCs it appears that the last IOC to start is the one that cannot connect with Channel Access.
>>
>> I rolled back to the 7.0.3.1 versions of the IOCs and they all connect fine.
>>
>> I cannot say that I have conclusively proved the above by starting the IOCs in all different orders, etc.  But it looking like a problem.
> 
> I just started 2 IOCs on windows-x64. I can connect to the first, but not to the second:
> 
>> *tux% *caget 1:BaseVersion
>> 1:BaseVersion                  7.0.4.1-DEV
>> *tux% *caget 2:BaseVersion
>> Channel connect timed out: '2:BaseVersion' not found.
> 
> I tested this first on my Mac laptop and was able to connect to 6 IOCs with no problems, so this looks like it’s a Windows issue.
> 
> It is often useful to see the output of running ‘casr 1’ which shows how the IOC has configured its network connections. This was casr on my first IOC:
> 
>> epics> casr 1                                                                   
>> Channel Access Server V4.13                                                     
>> No clients connected.                                                           
>> CAS-TCP server on 0.0.0.0:5064 with                                             
>>     CAS-UDP name server on 0.0.0.0:5064                                         
>> Sending CAS-beacons to 1 address:                                               
>>     164.54.11.255:5065                                                         
> 
> This was the complete startup of my second IOC (while the first was still running):
> 
>> C:\epics\base-7.0\bin\windows-x64>softIoc -x 2                              
>> dbLoadDatabase("C:\epics\base-7.0\bin\windows-x64\..\..\dbd\softIoc.dbd")                   
>> softIoc_registerRecordDeviceDriver(pdbbase)                                                 
>> iocInit()                                                                                   
>> Starting iocInit                                                                            
>> ############################################################################                
>> ## EPICS R7.0.4.1-DEV                                                                       
>> ## Rev. R7.0.4-11-g786c4c2ca29f750bc49c-dirty                                               
>> ############################################################################                
>> iocRun: All initialization complete                                                         
>> epics> dbl                                                                                  
>> 2:BaseVersion                                                                               
>> 2:exit                                                                                      
>> epics> casr                                                                                 
>> Channel Access Server V4.13                                                                 
>> No clients connected.                                                                       
>> epics> casr 1                                                                               
>> Channel Access Server V4.13                                                                 
>> No clients connected.                                                                       
>> CAS-TCP server on 0.0.0.0:5064 with                                                         
>>     CAS-UDP name server on 0.0.0.0:5064                                                     
>> Sending CAS-beacons to 1 address:                                                           
>>     164.54.11.255:5065
> 
> Note that the output from iocInit there does not contain the usual warning about the configured TCP port being unavailable, and both CAS-TCP servers claim to have TCP port 5064 open. That is a big tell that we have a bug.
> 
> 
>> Were there any changes in 7.0.4 that might have caused such a problem?
> 
> Yes, although I don’t know the details myself, and unfortunately nothing appears in the Release Notes to describe those changes. One commit which looks particularly interesting was this:
> 
>> commit db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6
>> Author: Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>> Date:   Wed Feb 5 10:30:58 2020 -0800
>>
>>     use one osdSockAddrReuse impl for all targets
>>
>>     
>>
>>     drop win32 specialization of osdSockAddrReuse
>>
> 
> A fairly prominent comment in the deleted files said:
> 
>> -/*
>> - * Note: WINSOCK appears to assign a different functionality for 
>> - * SO_REUSEADDR compared to other OS. With WINSOCK SO_REUSEADDR indicates
>> - * that simultaneously servers can bind to the same TCP port on the same host!
>> - * Also, servers are always enabled to reuse a port immediately after 
>> - * they exit ( even if SO_REUSEADDR isnt set ).
>> - */
> 
> I just tried reverting that commit and had to fix up the symbol decoration macros in the WIN32 file, but with that done when I start two IOCs, the second one shows the expected TCP warning and I can connect to both with no problem.
> 
> I think that means we’ll be able to provide a fix fairly quickly, but I will leave it to Michael to decide exactly what changes to make.
> 
> Thanks for the report,
> 
> - Andrew
> 
> -- 
> Complexity comes for free, simplicity you have to work for.
> 


Replies:
Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk
References:
Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk

Navigate by Date:
Prev: Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
Next: makefile include order Ben Franksen via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
Next: Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 19 Jun 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·