Hi Mark,
I suppose it's fitting for you to report this regression which
has its roots in my attempt to fix another issue you reported.
https://epics.anl.gov/core-talk/2020/msg00050.php
db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6 was one of a series
of commits effecting this code and can't simply be reverted
in full. Copying in the original WIN32/osdSockAddrReuse.cpp
should do as a quick workaround.
https://github.com/epics-base/epics-base/blob/5064931aa6e54481832951b4f27a982c5003233d/modules/libcom/src/osi/os/WIN32/osdSockAddrReuse.cpp
The changes which cause this were made during the Feb. codeathon
at DLS. At the time I added a new unittest to osiSockTest which
checks the behavior of bind(), but doesn't actually send/receive
any traffic.
I would have liked to test this as well. The problem is that I
don't know of a portable way to send broadcast traffic through
the loopback. Using other interfaces would be a violation of
the usual practice of unittest isolation, as other hosts could
see this traffic.
Right now, I'm inclined towards doing this though. With a test
which sends legitimate CA search broadcasts on 5064 through all
interfaces with a random/invalid PV name.
I think the risk of adverse external effects would be minimal,
and the benefit of avoiding future regressions sufficient
to justify it.
On 6/18/20 8:37 PM, Johnson, Andrew N. via Core-talk wrote:
> Hi Mark,
>
> On Jun 18, 2020, at 9:39 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
>>
>> I am writing to report a possible problem with base 7.0.4 on Windows.
>>
>> Today I updated EPICS on a Windows machine from base 7.0.3.1 to 7.0.4. This machine runs 3 EPICS IOCs; 2 are areaDetector camera IOCs and the third runs various serial and TCP devices.
>>
>> The behavior I observed was strange.
>>
>> - The first 2 IOCs to start worked fine, I could connect to them from Channel Access clients.
>>
>> - The third IOC started fine, and dbl showed all the PVs. However, I could not connect to them from any Channel Access client (medm ,caget, etc.)
>>
>> If I changed the order in which I started the IOCs it appears that the last IOC to start is the one that cannot connect with Channel Access.
>>
>> I rolled back to the 7.0.3.1 versions of the IOCs and they all connect fine.
>>
>> I cannot say that I have conclusively proved the above by starting the IOCs in all different orders, etc. But it looking like a problem.
>
> I just started 2 IOCs on windows-x64. I can connect to the first, but not to the second:
>
>> *tux% *caget 1:BaseVersion
>> 1:BaseVersion 7.0.4.1-DEV
>> *tux% *caget 2:BaseVersion
>> Channel connect timed out: '2:BaseVersion' not found.
>
> I tested this first on my Mac laptop and was able to connect to 6 IOCs with no problems, so this looks like it’s a Windows issue.
>
> It is often useful to see the output of running ‘casr 1’ which shows how the IOC has configured its network connections. This was casr on my first IOC:
>
>> epics> casr 1
>> Channel Access Server V4.13
>> No clients connected.
>> CAS-TCP server on 0.0.0.0:5064 with
>> CAS-UDP name server on 0.0.0.0:5064
>> Sending CAS-beacons to 1 address:
>> 164.54.11.255:5065
>
> This was the complete startup of my second IOC (while the first was still running):
>
>> C:\epics\base-7.0\bin\windows-x64>softIoc -x 2
>> dbLoadDatabase("C:\epics\base-7.0\bin\windows-x64\..\..\dbd\softIoc.dbd")
>> softIoc_registerRecordDeviceDriver(pdbbase)
>> iocInit()
>> Starting iocInit
>> ############################################################################
>> ## EPICS R7.0.4.1-DEV
>> ## Rev. R7.0.4-11-g786c4c2ca29f750bc49c-dirty
>> ############################################################################
>> iocRun: All initialization complete
>> epics> dbl
>> 2:BaseVersion
>> 2:exit
>> epics> casr
>> Channel Access Server V4.13
>> No clients connected.
>> epics> casr 1
>> Channel Access Server V4.13
>> No clients connected.
>> CAS-TCP server on 0.0.0.0:5064 with
>> CAS-UDP name server on 0.0.0.0:5064
>> Sending CAS-beacons to 1 address:
>> 164.54.11.255:5065
>
> Note that the output from iocInit there does not contain the usual warning about the configured TCP port being unavailable, and both CAS-TCP servers claim to have TCP port 5064 open. That is a big tell that we have a bug.
>
>
>> Were there any changes in 7.0.4 that might have caused such a problem?
>
> Yes, although I don’t know the details myself, and unfortunately nothing appears in the Release Notes to describe those changes. One commit which looks particularly interesting was this:
>
>> commit db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6
>> Author: Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>> Date: Wed Feb 5 10:30:58 2020 -0800
>>
>> use one osdSockAddrReuse impl for all targets
>>
>>
>>
>> drop win32 specialization of osdSockAddrReuse
>>
>
> A fairly prominent comment in the deleted files said:
>
>> -/*
>> - * Note: WINSOCK appears to assign a different functionality for
>> - * SO_REUSEADDR compared to other OS. With WINSOCK SO_REUSEADDR indicates
>> - * that simultaneously servers can bind to the same TCP port on the same host!
>> - * Also, servers are always enabled to reuse a port immediately after
>> - * they exit ( even if SO_REUSEADDR isnt set ).
>> - */
>
> I just tried reverting that commit and had to fix up the symbol decoration macros in the WIN32 file, but with that done when I start two IOCs, the second one shows the expected TCP warning and I can connect to both with no problem.
>
> I think that means we’ll be able to provide a fix fairly quickly, but I will leave it to Michael to decide exactly what changes to make.
>
> Thanks for the report,
>
> - Andrew
>
> --
> Complexity comes for free, simplicity you have to work for.
>
- Replies:
- Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk
- References:
- Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
- Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
- Navigate by Date:
- Prev:
Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
- Next:
makefile include order Ben Franksen via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Possible problem with base 7.0.4 on Windows Johnson, Andrew N. via Core-talk
- Next:
Re: Possible problem with base 7.0.4 on Windows (regression) Mark Rivers via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
|