2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 | Index | 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | EPICS Base 3.15.8: Name resolution requests directed to non-existing hosts block CA Server |
From: | Goetz Pfeiffer via Core-talk <core-talk at aps.anl.gov> |
To: | <core-talk at aps.anl.gov> |
Date: | Fri, 4 Jun 2021 10:50:31 +0200 |
Hello,
at the Helmholtz-Zentrum Berlin we have encountered a problem
with EPICS Base 3.15.8
built on recent versions of Linux (Debian 9, Fedora-33). It may
also be a problem with newer
versions of EPICS base.
Under certain conditions programs using EPICS base, especially
the channel access gateway,
are delayed and take a long time to answer requests.
The following conditions must be met:
The channel access server has large delays when answering client
requests. Establishing a new
connection takes 2 seconds or even more instead of the usual 0.05
seconds.
Monitor events are not posted immediately but stop for a few
seconds, then all
missing monitors are posted almost at the same time.
I have assembled some scripts to reproduce the problem on a linux system.
The scripts together with a README.rst file that describes how to run the test can be downloaded here:
https://www-csr.bessy.de/tmp/gatewaytest.tar.gz
The problem can be tracked down to code in file
src/ca/client/udpiiu.cpp in EPICS Base. There function
"sendto" is used to send UDP unicasts. If this function is called
many times with destination IP addresses
on the local network that do not exist, some UDP buffer in linux
fills up and finally sendto blocks for some
seconds. Then the buffer is (probably) flushed and the next few
calls to sendto no longer block. But after
some time the buffer fills up again and sendto blocks again.
However, in this part of EPICS Base sendto is not expected to
block. When this happens the channel
access server code cannot answer requests.
We provide the "MSG_DONTWAIT" flag to the "sendto" call. By this,
sendto never blocks even if internal
UDP buffers are full.
I have added a patch file for this to this e-mail.
Could you please look if you to have this problem, too ?
Maybe you could add my patch to EPICS Base ?
Greetings
Goetz Pfeiffer (control system department, Helmholtz-Zentrum
Berlin)
diff -r ebdbc82f5ca0 src/ca/client/udpiiu.cpp --- a/src/ca/client/udpiiu.cpp Fri Jun 04 10:38:49 2021 +0200 +++ b/src/ca/client/udpiiu.cpp Fri Jun 04 10:44:08 2021 +0200 @@ -942,7 +942,23 @@ int bufSizeAsInt = static_cast < int > ( bufSize ); while ( true ) { // This const_cast is needed for vxWorks: - int status = sendto ( _udpiiu.sock, const_cast<char *>(pBuf), bufSizeAsInt, 0, + int status = sendto ( _udpiiu.sock, const_cast<char *>(pBuf), bufSizeAsInt, +#ifndef __linux__ + 0, +#else + /* On modern Linux systems, when sendto() is used to do UDP + * unicasts, it blocks if the destination host is down and the + * internal UDP send buffer is filled. This lasts up to 2 + * seconds. + * However, EPICS Base doesn't expect this call to block. If + * this happens, other things like answering name resolution + * requests or firing up monitors are also delayed. + * In order to avoid these problems, we provide the + * MSG_DONTWAIT flag, when EPICS Base is compiled for Linux. + * This means that sendto() always returns immediately. + */ + MSG_DONTWAIT, +#endif & _destAddr.sa, sizeof ( _destAddr.sa ) ); if ( status == bufSizeAsInt ) { break;
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature