EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Possible problem with base 7.0.4 on Windows
From: "Johnson, Andrew N. via Core-talk" <core-talk at aps.anl.gov>
To: "Rivers, Mark L." <rivers at cars.uchicago.edu>
Cc: EPICS core-talk <core-talk at aps.anl.gov>
Date: Fri, 19 Jun 2020 03:37:35 +0000
Hi Mark,

On Jun 18, 2020, at 9:39 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov> wrote:

I am writing to report a possible problem with base 7.0.4 on Windows.

Today I updated EPICS on a Windows machine from base 7.0.3.1 to 7.0.4.  This machine runs 3 EPICS IOCs; 2 are areaDetector camera IOCs and the third runs various serial and TCP devices.

The behavior I observed was strange.

- The first 2 IOCs to start worked fine, I could connect to them from Channel Access clients.

- The third IOC started fine, and dbl showed all the PVs.  However, I could not connect to them from any Channel Access client (medm ,caget, etc.)

If I changed the order in which I started the IOCs it appears that the last IOC to start is the one that cannot connect with Channel Access.

I rolled back to the 7.0.3.1 versions of the IOCs and they all connect fine.

I cannot say that I have conclusively proved the above by starting the IOCs in all different orders, etc.  But it looking like a problem.

I just started 2 IOCs on windows-x64. I can connect to the first, but not to the second:

tux% caget 1:BaseVersion
1:BaseVersion                  7.0.4.1-DEV
tux% caget 2:BaseVersion
Channel connect timed out: '2:BaseVersion' not found.

I tested this first on my Mac laptop and was able to connect to 6 IOCs with no problems, so this looks like it’s a Windows issue.

It is often useful to see the output of running ‘casr 1’ which shows how the IOC has configured its network connections. This was casr on my first IOC:

epics> casr 1                                                                   
Channel Access Server V4.13                                                     
No clients connected.                                                           
CAS-TCP server on 0.0.0.0:5064 with                                             
    CAS-UDP name server on 0.0.0.0:5064                                         
Sending CAS-beacons to 1 address:                                               
    164.54.11.255:5065                                                          

This was the complete startup of my second IOC (while the first was still running):

C:\epics\base-7.0\bin\windows-x64>softIoc -x 2                              
dbLoadDatabase("C:\epics\base-7.0\bin\windows-x64\..\..\dbd\softIoc.dbd")                   
softIoc_registerRecordDeviceDriver(pdbbase)                                                 
iocInit()                                                                                   
Starting iocInit                                                                            
############################################################################                
## EPICS R7.0.4.1-DEV                                                                       
## Rev. R7.0.4-11-g786c4c2ca29f750bc49c-dirty                                               
############################################################################                
iocRun: All initialization complete                                                         
epics> dbl                                                                                  
2:BaseVersion                                                                               
2:exit                                                                                      
epics> casr                                                                                 
Channel Access Server V4.13                                                                 
No clients connected.                                                                       
epics> casr 1                                                                               
Channel Access Server V4.13                                                                 
No clients connected.                                                                       
CAS-TCP server on 0.0.0.0:5064 with                                                         
    CAS-UDP name server on 0.0.0.0:5064                                                     
Sending CAS-beacons to 1 address:                                                           
    164.54.11.255:5065

Note that the output from iocInit there does not contain the usual warning about the configured TCP port being unavailable, and both CAS-TCP servers claim to have TCP port 5064 open. That is a big tell that we have a bug.


Were there any changes in 7.0.4 that might have caused such a problem?

Yes, although I don’t know the details myself, and unfortunately nothing appears in the Release Notes to describe those changes. One commit which looks particularly interesting was this:

commit db6e7c7a22b73f70a8b93e2aa4b6fa505e0218a6
Author: Michael Davidsaver <mdavidsaver at gmail.com>
Date:   Wed Feb 5 10:30:58 2020 -0800

    use one osdSockAddrReuse impl for all targets

    

    drop win32 specialization of osdSockAddrReuse


A fairly prominent comment in the deleted files said:

-/*
- * Note: WINSOCK appears to assign a different functionality for 
- * SO_REUSEADDR compared to other OS. With WINSOCK SO_REUSEADDR indicates
- * that simultaneously servers can bind to the same TCP port on the same host!
- * Also, servers are always enabled to reuse a port immediately after 
- * they exit ( even if SO_REUSEADDR isnt set ).
- */

I just tried reverting that commit and had to fix up the symbol decoration macros in the WIN32 file, but with that done when I start two IOCs, the second one shows the expected TCP warning and I can connect to both with no problem.

I think that means we’ll be able to provide a fix fairly quickly, but I will leave it to Michael to decide exactly what changes to make.

Thanks for the report,

- Andrew

-- 
Complexity comes for free, simplicity you have to work for.


Replies:
Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
References:
Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk

Navigate by Date:
Prev: Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
Next: Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Possible problem with base 7.0.4 on Windows Mark Rivers via Core-talk
Next: Re: Possible problem with base 7.0.4 on Windows (regression) Michael Davidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 19 Jun 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·