EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server
From: Goetz Pfeiffer via Core-talk <core-talk at aps.anl.gov>
To: <core-talk at aps.anl.gov>
Date: Fri, 4 Jun 2021 10:50:31 +0200

Hello,

at the Helmholtz-Zentrum Berlin we have encountered a problem with EPICS Base 3.15.8
built on recent versions of Linux (Debian 9, Fedora-33). It may also be a problem with newer
versions of EPICS base.

Under certain conditions programs using EPICS base, especially the channel access gateway,
are delayed and take a long time to answer requests.

The conditions that cause the problem

The following conditions must be met:

  • An application must contain a channel access server and a channel access client.
  • A channel access client must be configured by EPICS_CA_ADDR_LIST to connect directly to a number of IP addresses in the local network.
  • Many of the hosts listed in EPICS_CA_ADDR_LIST are not up or do not exist.
  • The channel access client must constantly try to resolve many PVs by trying to connect to the hosts from EPICS_CA_ADDR_LIST.

The symptoms of the problem

The channel access server has large delays when answering client requests. Establishing a new
connection takes 2 seconds or even more instead of the usual 0.05 seconds.

Monitor events are not posted immediately but stop for a few seconds, then all
missing monitors are posted almost at the same time.

A setup to reproduce the problem

I have assembled some scripts to reproduce the problem on a linux system.

The scripts together with a README.rst file that describes how to run the test can be downloaded here:

  https://www-csr.bessy.de/tmp/gatewaytest.tar.gz

Causes of the problem

The problem can be tracked down to code in file src/ca/client/udpiiu.cpp in EPICS Base. There function
"sendto" is used to send UDP unicasts. If this function is called many times with destination IP addresses
on the local network that do not exist, some UDP buffer in linux fills up and finally sendto blocks for some
seconds. Then the buffer is (probably) flushed and the next few calls to sendto no longer block. But after
some time the buffer fills up again and sendto blocks again.

However, in this part of EPICS Base sendto is not expected to block. When this happens the channel
access server code cannot answer requests.

Resolution of the problem

We provide the "MSG_DONTWAIT" flag to the "sendto" call. By this, sendto never blocks even if internal
UDP buffers are full.

I have added a patch file for this to this e-mail.

Finally...

Could you please look if you to have this problem, too ?
Maybe you could add my patch to EPICS Base ?

Greetings

  Goetz Pfeiffer (control system department, Helmholtz-Zentrum Berlin)

diff -r ebdbc82f5ca0 src/ca/client/udpiiu.cpp
--- a/src/ca/client/udpiiu.cpp	Fri Jun 04 10:38:49 2021 +0200
+++ b/src/ca/client/udpiiu.cpp	Fri Jun 04 10:44:08 2021 +0200
@@ -942,7 +942,23 @@
     int bufSizeAsInt = static_cast < int > ( bufSize );
     while ( true ) {
         // This const_cast is needed for vxWorks:
-        int status = sendto ( _udpiiu.sock, const_cast<char *>(pBuf), bufSizeAsInt, 0, 
+        int status = sendto ( _udpiiu.sock, const_cast<char *>(pBuf), bufSizeAsInt, 
+#ifndef __linux__
+                0, 
+#else
+                /* On modern Linux systems, when sendto() is used to do UDP
+                 * unicasts, it blocks if the destination host is down and the
+                 * internal UDP send buffer is filled. This lasts up to 2
+                 * seconds.
+                 * However, EPICS Base doesn't expect this call to block. If
+                 * this happens, other things like answering name resolution
+                 * requests or firing up monitors are also delayed. 
+                 * In order to avoid these problems, we provide the
+                 * MSG_DONTWAIT flag, when EPICS Base is compiled for Linux.
+                 * This means that sendto() always returns immediately.
+                 */
+                MSG_DONTWAIT,
+#endif
                 & _destAddr.sa, sizeof ( _destAddr.sa ) );
         if ( status == bufSizeAsInt ) {
             break;

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Replies:
Re: EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server Ralph Lange via Core-talk
Re: EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server Michael Davidsaver via Core-talk

Navigate by Date:
Prev: Heads-Up: Next Release of Base (C++) Ralph Lange via Core-talk
Next: Re: EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: Heads-Up: Next Release of Base (C++) Ralph Lange via Core-talk
Next: Re: EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 04 Jun 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·