EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Possible problem with base 7.0.4 on Windows (regression)
From: Mark Rivers via Core-talk <core-talk at aps.anl.gov>
To: Michael Davidsaver <mdavidsaver at gmail.com>
Cc: EPICS core-talk <core-talk at aps.anl.gov>
Date: Fri, 19 Jun 2020 20:58:14 +0000
Hi Michael,


I tested your suggestion of resurrecting the original WIN32/osdSockAddrReuse.cpp.  I just had to tweak the decorations.  It fixed the problem.


I just posted to tech-talk so there is a known workaround until there is a patch release.


Thanks,

Mark




________________________________
From: Michael Davidsaver <mdavidsaver at gmail.com>
Sent: Friday, June 19, 2020 1:04 AM
To: Mark Rivers
Cc: Johnson, Andrew N.; EPICS core-talk
Subject: Re: Possible problem with base 7.0.4 on Windows (regression)

Hi Mark,

I suppose it's fitting for you to report this regression which
has its roots in my attempt to fix another issue you reported.

https://epics.anl.gov/core-talk/2020/msg00050.php

db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6 was one of a series
of commits effecting this code and can't simply be reverted
in full.  Copying in the original WIN32/osdSockAddrReuse.cpp
should do as a quick workaround.

https://github.com/epics-base/epics-base/blob/5064931aa6e54481832951b4f27a982c5003233d/modules/libcom/src/osi/os/WIN32/osdSockAddrReuse.cpp


The changes which cause this were made during the Feb. codeathon
at DLS.  At the time I added a new unittest to osiSockTest which
checks the behavior of bind(), but doesn't actually send/receive
any traffic.

I would have liked to test this as well.  The problem is that I
don't know of a portable way to send broadcast traffic through
the loopback.  Using other interfaces would be a violation of
the usual practice of unittest isolation, as other hosts could
see this traffic.

Right now, I'm inclined towards doing this though.  With a test
which sends legitimate CA search broadcasts on 5064 through all
interfaces with a random/invalid PV name.

I think the risk of adverse external effects would be minimal,
and the benefit of avoiding future regressions sufficient
to justify it.


On 6/18/20 8:37 PM, Johnson, Andrew N. via Core-talk wrote:
> Hi Mark,
>
> On Jun 18, 2020, at 9:39 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
>>
>> I am writing to report a possible problem with base 7.0.4 on Windows.
>>
>> Today I updated EPICS on a Windows machine from base 7.0.3.1 to 7.0.4.  This machine runs 3 EPICS IOCs; 2 are areaDetector camera IOCs and the third runs various serial and TCP devices.
>>
>> The behavior I observed was strange.
>>
>> - The first 2 IOCs to start worked fine, I could connect to them from Channel Access clients.
>>
>> - The third IOC started fine, and dbl showed all the PVs.  However, I could not connect to them from any Channel Access client (medm ,caget, etc.)
>>
>> If I changed the order in which I started the IOCs it appears that the last IOC to start is the one that cannot connect with Channel Access.
>>
>> I rolled back to the 7.0.3.1 versions of the IOCs and they all connect fine.
>>
>> I cannot say that I have conclusively proved the above by starting the IOCs in all different orders, etc.  But it looking like a problem.
>
> I just started 2 IOCs on windows-x64. I can connect to the first, but not to the second:
>
>> *tux% *caget 1:BaseVersion
>> 1:BaseVersion                  7.0.4.1-DEV
>> *tux% *caget 2:BaseVersion
>> Channel connect timed out: '2:BaseVersion' not found.
>
> I tested this first on my Mac laptop and was able to connect to 6 IOCs with no problems, so this looks like it’s a Windows issue.
>
> It is often useful to see the output of running ‘casr 1’ which shows how the IOC has configured its network connections. This was casr on my first IOC:
>
>> epics> casr 1
>> Channel Access Server V4.13
>> No clients connected.
>> CAS-TCP server on 0.0.0.0:5064 with
>>     CAS-UDP name server on 0.0.0.0:5064
>> Sending CAS-beacons to 1 address:
>>     164.54.11.255:5065
>
> This was the complete startup of my second IOC (while the first was still running):
>
>> C:\epics\base-7.0\bin\windows-x64>softIoc -x 2
>> dbLoadDatabase("C:\epics\base-7.0\bin\windows-x64\..\..\dbd\softIoc.dbd")
>> softIoc_registerRecordDeviceDriver(pdbbase)
>> iocInit()
>> Starting iocInit
>> ############################################################################
>> ## EPICS R7.0.4.1-DEV
>> ## Rev. R7.0.4-11-g786c4c2ca29f750bc49c-dirty
>> ############################################################################
>> iocRun: All initialization complete
>> epics> dbl
>> 2:BaseVersion
>> 2:exit
>> epics> casr
>> Channel Access Server V4.13
>> No clients connected.
>> epics> casr 1
>> Channel Access Server V4.13
>> No clients connected.
>> CAS-TCP server on 0.0.0.0:5064 with
>>     CAS-UDP name server on 0.0.0.0:5064
>> Sending CAS-beacons to 1 address:
>>     164.54.11.255:5065
>
> Note that the output from iocInit there does not contain the usual warning about the configured TCP port being unavailable, and both CAS-TCP servers claim to have TCP port 5064 open. That is a big tell that we have a bug.
>
>
>> Were there any changes in 7.0.4 that might have caused such a problem?
>
> Yes, although I don’t know the details myself, and unfortunately nothing appears in the Release Notes to describe those changes. One commit which looks particularly interesting was this:
>
>> commit db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6
>> Author: Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>> Date:   Wed Feb 5 10:30:58 2020 -0800
>>
>>     use one osdSockAddrReuse impl for all targets
>>
>>
>>
>>     drop win32 specialization of osdSockAddrReuse
>>
>
> A fairly prominent comment in the deleted files said:
>
>> -/*
>> - * Note: WINSOCK appears to assign a different functionality for
>> - * SO_REUSEADDR compared to other OS. With WINSOCK SO_REUSEADDR indicates
>> - * that simultaneously servers can bind to the same TCP port on the same host!
>> - * Also, servers are always enabled to reuse a port immediately after
>> - * they exit ( even if SO_REUSEADDR isnt set ).
>> - */
>
> I just tried reverting that commit and had to fix up the symbol decoration macros in the WIN32 file, but with that done when I start two IOCs, the second one shows the expected TCP warning and I can connect to both with no problem.
>
> I think that means we’ll be able to provide a fix fairly quickly, but I will leave it to Michael to decide exactly what changes to make.
>
> Thanks for the report,
>
> - Andrew
>
> --
> Complexity comes for free, simplicity you have to work for.
>


Replies:
Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
References:
Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk

Navigate by Date:
Prev: BETA: ci-scripts support for GitHub Actions Ralph Lange via Core-talk
Next: Re: makefile include order Johnson, Andrew N. via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
Next: Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 19 Jun 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·