EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Possible problem with base 7.0.4 on Windows (regression)
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: EPICS core-talk <core-talk at aps.anl.gov>
Date: Fri, 19 Jun 2020 20:39:48 -0700
I've pushed 19146a597b42bc5f03aed1d97ccef56b4c4d0fac which I think fixes
this issue.  Which I've noted as:

https://bugs.launchpad.net/epics-base/+bug/1884339

I'm still working on unittest coverage of the two functions in osdSockAddrReuse.cpp.
There is a PR for this change as I'd like to get comments on the UDP fanout test
before merging.

https://github.com/epics-base/epics-base/pull/79

It passes on Windows/appveyor with this fix, and fails without it.
So I think I've fixed the bug.  However, one travis-ci configuration
is still failing.  I think this is due to VM network configuration.


On 6/19/20 1:58 PM, Mark Rivers wrote:
> Hi Michael,
> 
> 
> I tested your suggestion of resurrecting the original WIN32/osdSockAddrReuse.cpp.  I just had to tweak the decorations.  It fixed the problem.
> 
> 
> I just posted to tech-talk so there is a known workaround until there is a patch release.
> 
> 
> Thanks,
> 
> Mark
> 
> 
> 
> 
> ________________________________
> From: Michael Davidsaver <mdavidsaver at gmail.com>
> Sent: Friday, June 19, 2020 1:04 AM
> To: Mark Rivers
> Cc: Johnson, Andrew N.; EPICS core-talk
> Subject: Re: Possible problem with base 7.0.4 on Windows (regression)
> 
> Hi Mark,
> 
> I suppose it's fitting for you to report this regression which
> has its roots in my attempt to fix another issue you reported.
> 
> https://epics.anl.gov/core-talk/2020/msg00050.php
> 
> db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6 was one of a series
> of commits effecting this code and can't simply be reverted
> in full.  Copying in the original WIN32/osdSockAddrReuse.cpp
> should do as a quick workaround.
> 
> https://github.com/epics-base/epics-base/blob/5064931aa6e54481832951b4f27a982c5003233d/modules/libcom/src/osi/os/WIN32/osdSockAddrReuse.cpp
> 
> 
> The changes which cause this were made during the Feb. codeathon
> at DLS.  At the time I added a new unittest to osiSockTest which
> checks the behavior of bind(), but doesn't actually send/receive
> any traffic.
> 
> I would have liked to test this as well.  The problem is that I
> don't know of a portable way to send broadcast traffic through
> the loopback.  Using other interfaces would be a violation of
> the usual practice of unittest isolation, as other hosts could
> see this traffic.
> 
> Right now, I'm inclined towards doing this though.  With a test
> which sends legitimate CA search broadcasts on 5064 through all
> interfaces with a random/invalid PV name.
> 
> I think the risk of adverse external effects would be minimal,
> and the benefit of avoiding future regressions sufficient
> to justify it.
> 
> 
> On 6/18/20 8:37 PM, Johnson, Andrew N. via Core-talk wrote:
>> Hi Mark,
>>
>> On Jun 18, 2020, at 9:39 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
>>>
>>> I am writing to report a possible problem with base 7.0.4 on Windows.
>>>
>>> Today I updated EPICS on a Windows machine from base 7.0.3.1 to 7.0.4.  This machine runs 3 EPICS IOCs; 2 are areaDetector camera IOCs and the third runs various serial and TCP devices.
>>>
>>> The behavior I observed was strange.
>>>
>>> - The first 2 IOCs to start worked fine, I could connect to them from Channel Access clients.
>>>
>>> - The third IOC started fine, and dbl showed all the PVs.  However, I could not connect to them from any Channel Access client (medm ,caget, etc.)
>>>
>>> If I changed the order in which I started the IOCs it appears that the last IOC to start is the one that cannot connect with Channel Access.
>>>
>>> I rolled back to the 7.0.3.1 versions of the IOCs and they all connect fine.
>>>
>>> I cannot say that I have conclusively proved the above by starting the IOCs in all different orders, etc.  But it looking like a problem.
>>
>> I just started 2 IOCs on windows-x64. I can connect to the first, but not to the second:
>>
>>> *tux% *caget 1:BaseVersion
>>> 1:BaseVersion                  7.0.4.1-DEV
>>> *tux% *caget 2:BaseVersion
>>> Channel connect timed out: '2:BaseVersion' not found.
>>
>> I tested this first on my Mac laptop and was able to connect to 6 IOCs with no problems, so this looks like it’s a Windows issue.
>>
>> It is often useful to see the output of running ‘casr 1’ which shows how the IOC has configured its network connections. This was casr on my first IOC:
>>
>>> epics> casr 1
>>> Channel Access Server V4.13
>>> No clients connected.
>>> CAS-TCP server on 0.0.0.0:5064 with
>>>     CAS-UDP name server on 0.0.0.0:5064
>>> Sending CAS-beacons to 1 address:
>>>     164.54.11.255:5065
>>
>> This was the complete startup of my second IOC (while the first was still running):
>>
>>> C:\epics\base-7.0\bin\windows-x64>softIoc -x 2
>>> dbLoadDatabase("C:\epics\base-7.0\bin\windows-x64\..\..\dbd\softIoc.dbd")
>>> softIoc_registerRecordDeviceDriver(pdbbase)
>>> iocInit()
>>> Starting iocInit
>>> ############################################################################
>>> ## EPICS R7.0.4.1-DEV
>>> ## Rev. R7.0.4-11-g786c4c2ca29f750bc49c-dirty
>>> ############################################################################
>>> iocRun: All initialization complete
>>> epics> dbl
>>> 2:BaseVersion
>>> 2:exit
>>> epics> casr
>>> Channel Access Server V4.13
>>> No clients connected.
>>> epics> casr 1
>>> Channel Access Server V4.13
>>> No clients connected.
>>> CAS-TCP server on 0.0.0.0:5064 with
>>>     CAS-UDP name server on 0.0.0.0:5064
>>> Sending CAS-beacons to 1 address:
>>>     164.54.11.255:5065
>>
>> Note that the output from iocInit there does not contain the usual warning about the configured TCP port being unavailable, and both CAS-TCP servers claim to have TCP port 5064 open. That is a big tell that we have a bug.
>>
>>
>>> Were there any changes in 7.0.4 that might have caused such a problem?
>>
>> Yes, although I don’t know the details myself, and unfortunately nothing appears in the Release Notes to describe those changes. One commit which looks particularly interesting was this:
>>
>>> commit db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6
>>> Author: Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>>> Date:   Wed Feb 5 10:30:58 2020 -0800
>>>
>>>     use one osdSockAddrReuse impl for all targets
>>>
>>>
>>>
>>>     drop win32 specialization of osdSockAddrReuse
>>>
>>
>> A fairly prominent comment in the deleted files said:
>>
>>> -/*
>>> - * Note: WINSOCK appears to assign a different functionality for
>>> - * SO_REUSEADDR compared to other OS. With WINSOCK SO_REUSEADDR indicates
>>> - * that simultaneously servers can bind to the same TCP port on the same host!
>>> - * Also, servers are always enabled to reuse a port immediately after
>>> - * they exit ( even if SO_REUSEADDR isnt set ).
>>> - */
>>
>> I just tried reverting that commit and had to fix up the symbol decoration macros in the WIN32 file, but with that done when I start two IOCs, the second one shows the expected TCP warning and I can connect to both with no problem.
>>
>> I think that means we’ll be able to provide a fix fairly quickly, but I will leave it to Michael to decide exactly what changes to make.
>>
>> Thanks for the report,
>>
>> - Andrew
>>
>> --
>> Complexity comes for free, simplicity you have to work for.
>>
> 


References:
Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk

Navigate by Date:
Prev: [Bug 1884339] Re: Inaccessible CA servers on Windows mdavidsaver via Core-talk
Next: [Bug 1884339] Re: Inaccessible CA servers on Windows mdavidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk
Next: makefile include order Ben Franksen via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 19 Jun 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·