Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	RE: EZCA and ezcaIDL improvements?
From:	"Mark Rivers" <[email protected]>
To:	<[email protected]>
Date:	Fri, 28 Oct 2011 16:37:13 -0500

Folks,

I just found one of our vxWorks IOCs had drained its 256-byte mbufs. inetstatShow gave me IP address of the client that had filled up the sendQ on the server. It turns out that machine was NOT running IDL. Rather medm had crashed due to X errors, but the medm process was still running. That led to the same problem as IDL, i.e. the client (medm) had requested callbacks, but was no longer processing them. Killing the medm process fixed the problem and the mbufs recovered.

This was on a Windows 7 64-bit machine where we are having occasional problems with Exceed errors with medm. But I think the same thing could happen on Linux or other systems as well if the application generates an error but does not actually exit and close all sockets.

Mark

________________________________________

From: Mark Rivers

Sent: Friday, October 21, 2011 4:28 PM

To: '[email protected]'

Subject: RE: EZCA and ezcaIDL improvements?

Folks,

I have verified that indeed previous versions of the ezca library used by IDL, Matlab, and other scripting languages could easily cause vxWorks IOCs to die because of mbuf depletion. I have appended the output of the vxWorks IOC that demonstrates this. It happens when IDL enables monitors on some PVs and then does not periodically call ca_pend_event. This probably most often happens when one launches a GUI like scanSee or the MCA display program, then closes the GUI without closing IDL. The monitor messages initially queue up on the client, but eventually queue up on the vxWorks IOC. A single IDL client doing this waa not enough to kill the IOC, but 2 or 3 IDL clients were, and it happened within a few minutes.

The fix is to use a new version of ezca that is built to use preemptive callbacks with Channel Access 3.14. Till Straumann had already made the needed changes to ezca.c for use with his LabCA module. I have taken Till’s version and put it into the EPICS Extensions CVS repository at the APS. Using this new version of ezca I have built new versions of the ezcaIDL shareable library for 32-bit and 64-bit architectures on both Linux and Windows.

The new DLLs can be found (along with other binaries from EPICS base 3-14.12.1) here:

http://cars9.uchicago.edu/software/pub/EPICS_linux-x86_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_linux-x86_64_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_win32-x86_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_windows-x64_binaries.zip

Users who want to build the source code should be able to obtain ezca, EzcaScan and ezcaIDL from the CVS snapshots here:

http://www.aps.anl.gov/epics/download/extensions/index.php#cvs

However, it appears they are not being updated, so we will work on that.

There is a problem with generating the required undecorated function names in ezcaIDL.dll consistently for 32-bit and 64-bit Windows systems. For now there will be lots of warning messages when building for 64-bits, but it does work fine.

I have verified that when using these new versions the problem of mbuf depletion does not occur because the preemptive callbacks on the client continue to process the monitor events in the background even when IDL is not doing anything.

Till added a new “count” argument to ezcaSetMonitor. I have added that to ezcaIDL.c and ezcaIDL.pro. It defaults to 0, so existing code will work as it previously did.

I have also changed the way that ezcaIDL.pro finds the shareable library. This is from ezcaIDL/README. Previously only mechanism 1) below was supported.

1) First, it looks for the environment variable EZCA_IDL_SHARE. If this envionment

variable exists it must point to the complete path to ezcaIDL.dll (Windows) or

libezcaIDL.so (all other architectures). For example:

setenv EZCA_IDL_SHARE /usr/local/lib/libezcaIDL.so

2) If the shareable library is not found from the EZCA_IDL_SHARE environment variable

then it is searched for in the !IDL_PATH. The file name searched for is the

following:

base_!VERSION.OS_!VERSION.ARCH

base is either ezcaIDL.dll (Windows) or libezcaIDL.so (all other architectures).

!VERSION.OS can be "linux", "Win32", etc. !VERSION.ARCH can be "x86", "x86_64", etc.

Do "help, /structure, !VERSION" in IDL to see the values for your system.

This mechanism can be more convenient than using EZCA_IDL_SHARE because it allows

easy switching between 32 and 64 bit versions of IDL, and does not require setting an

environment variable. It does require either copying and renaming the files to a

location in the !IDL_PATH, or creating soft links in the !IDL_PATH to point to the

locations of the actual shareable libraries.

On our system I created soft links in idl_user/epics/, which is in the !IDL_PATH:

lrwxrwxrwx 1 epics epics 53 Oct 21 14:32 ezcaIDL.dll_Win32_x86 -> /usr/local/epics/extensions/bin/win32-x86/ezcaIDL.dll

lrwxrwxrwx 1 epics epics 55 Oct 21 14:32 ezcaIDL.dll_Win32_x86_64 -> /usr/local/epics/extensions/bin/windows-x64/ezcaIDL.dll

lrwxrwxrwx 1 epics epics 55 Oct 21 14:35 libezcaIDL.so_linux_x86 -> /usr/local/epics/extensions/lib/linux-x86/libezcaIDL.so

lrwxrwxrwx 1 epics epics 58 Oct 21 14:36 libezcaIDL.so_linux_x86_64 -> /usr/local/epics/extensions/lib/linux-x86_64/libezcaID

I suggest that anyone who is using IDL to communicate with EPICS PVs in vxWorks IOCs upgrade to the new version of these DLLs.

Cheers,

Mark

Here is a demonstration of the problem with the previous version. This is a vxWorks IOC before setting the monitor callbacks in IDL. 164.54.160.82 is the IP address of the Linux client where 3 IDL sessions are running.

ioc13lab> mbufShow

type number

--------- ------

FREE : 788

DATA : 5

HEADER : 7

SOCKET : 0

PCB : 0

RTABLE : 0

HTABLE : 0

ATABLE : 0

SONAME : 0

ZOMBIE : 0

SOOPTS : 0

FTABLE : 0

RIGHTS : 0

IFADDR : 0

CONTROL : 0

OOBDATA : 0

IPMOPTS : 0

IPMADDR : 0

IFMADDR : 0

MRTABLE : 0

TOTAL : 800

number of mbufs: 800

number of times failed to find space: 0

number of times waited for space: 0

number of times drained protocols for space: 0

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size clusters free usage

-------------------------------------------------------------------------------

64 125 117 450510

128 400 400 4990010

256 50 44 589498

512 25 25 6297

1024 25 25 7374

2048 25 25 146726

-------------------------------------------------------------------------------

value = 80 = 0x50 = 'P'

ioc13lab> inetstatShow

Active Internet connections (including servers)

PCB Proto Recv-Q Send-Q Local Address Foreign Address (state)

-------- ----- ------ ------ ------------------ ------------------ -------

1da89d4 TCP 0 0 164.54.160.73.5064 164.54.160.82.40702

1da8950 TCP 0 0 164.54.160.73.5064 164.54.160.82.40680

1da88cc TCP 0 0 164.54.160.73.5064 164.54.160.82.40674

1da8848 TCP 0 0 164.54.160.73.5064 164.54.160.82.40662

1da87c4 TCP 0 0 164.54.160.73.5064 164.54.160.82.40650

1da8740 TCP 0 0 164.54.160.73.5064 164.54.160.74.1123

1da8428 TCP 0 0 164.54.160.73.5064 164.54.160.130.2328

1da7f84 TCP 0 0 0.0.0.0.5064 0.0.0.0.0

1da7cf0 TCP 0 0 0.0.0.0.111 0.0.0.0.0

1da86bc UDP 0 0 0.0.0.0.989 0.0.0.0.0

1da8638 UDP 0 0 164.54.160.73.1038 0.0.0.0.0

1da85b4 UDP 0 0 127.0.0.1.1037 127.0.0.1.1031

1da8530 UDP 0 0 127.0.0.1.1036 127.0.0.1.1030

1da84ac UDP 0 0 0.0.0.0.5065 0.0.0.0.0

1da83a4 UDP 0 0 0.0.0.0.991 0.0.0.0.0

1da7df8 UDP 0 0 0.0.0.0.1034 0.0.0.0.0

1da8320 UDP 0 0 0.0.0.0.1033 0.0.0.0.0

1da829c UDP 0 0 0.0.0.0.1032 0.0.0.0.0

1da8218 UDP 0 0 0.0.0.0.1031 0.0.0.0.0

1da8194 UDP 0 0 0.0.0.0.1030 0.0.0.0.0

1da8110 UDP 0 0 0.0.0.0.1029 0.0.0.0.0

1da808c UDP 0 0 0.0.0.0.5064 0.0.0.0.0

1da8008 UDP 0 0 164.54.160.73.1035 164.54.160.255.5065

1da7f00 UDP 0 0 0.0.0.0.1028 0.0.0.0.0

1da7e7c UDP 0 0 0.0.0.0.1027 0.0.0.0.0

1da7d74 UDP 0 0 0.0.0.0.1025 0.0.0.0.0

1da7ae0 UDP 0 0 0.0.0.0.111 0.0.0.0.0

value = 1 = 0x1

This is the output of netstat on the client Linux machine as it local TCP buffers fill up:

corvette:~>netstat | grep ioc13lab

tcp 2971200 0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 2947200 0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 2994400 0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>more /proc/sys/net/ipv4/tcp_rmem

4096 87380 4194304

corvette:~>netstat | grep ioc13lab

tcp 3013800 0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 2989720 0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 3037000 0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>

corvette:~>netstat | grep ioc13lab

tcp 3419400 0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 3427280 0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 3406800 0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>netstat | grep ioc13lab

tcp 3419400 0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 3427280 0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 0 0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp 3406800 0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

Note that at this point 3 connections from corvette to ioc13lab have filled their receive buffers, and they are no longer getting any larger.

At that point the sendQ buffers on the vxWorks system begin filling up.

ioc13lab> inetstatShow

Active Internet connections (including servers)

PCB Proto Recv-Q Send-Q Local Address Foreign Address (state)

-------- ----- ------ ------ ------------------ ------------------ -------

1da89d4 TCP 0 0 164.54.160.73.5064 164.54.160.82.40702

1da8950 TCP 0 8192 164.54.160.73.5064 164.54.160.82.40680

1da88cc TCP 0 2600 164.54.160.73.5064 164.54.160.82.40674

1da8848 TCP 0 7760 164.54.160.73.5064 164.54.160.82.40662

1da87c4 TCP 0 0 164.54.160.73.5064 164.54.160.82.40650

1da8740 TCP 0 0 164.54.160.73.5064 164.54.160.74.1123

1da8428 TCP 0 0 164.54.160.73.5064 164.54.160.130.2328

1da7f84 TCP 0 0 0.0.0.0.5064 0.0.0.0.0

1da7cf0 TCP 0 0 0.0.0.0.111 0.0.0.0.0

1da86bc UDP 0 0 0.0.0.0.989 0.0.0.0.0

1da8638 UDP 0 0 164.54.160.73.1038 0.0.0.0.0

1da85b4 UDP 0 0 127.0.0.1.1037 127.0.0.1.1031

1da8530 UDP 0 0 127.0.0.1.1036 127.0.0.1.1030

1da84ac UDP 0 0 0.0.0.0.5065 0.0.0.0.0

1da83a4 UDP 0 0 0.0.0.0.991 0.0.0.0.0

1da7df8 UDP 0 0 0.0.0.0.1034 0.0.0.0.0

1da8320 UDP 0 0 0.0.0.0.1033 0.0.0.0.0

1da829c UDP 0 0 0.0.0.0.1032 0.0.0.0.0

1da8218 UDP 0 0 0.0.0.0.1031 0.0.0.0.0

1da8194 UDP 0

That in turn depletes the mbufs. Note that there are no free mbufs larger than 256 bytes.

ioc13lab> mbufShow

type number

--------- ------

FREE : 592

DATA : 201

HEADER : 7

SOCKET : 0

PCB : 0

RTABLE : 0

HTABLE : 0

ATABLE : 0

SONAME : 0

ZOMBIE : 0

SOOPTS : 0

FTABLE : 0

RIGHTS : 0

IFADDR : 0

CONTROL : 0

OOBDATA : 0

IPMOPTS : 0

IPMADDR : 0

IFMADDR : 0

MRTABLE : 0

TOTAL : 800

number of mbufs: 800

number of times failed to find space: 0

number of times waited for space: 0

number of times drained protocols for space: 0

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size clusters free usage

-------------------------------------------------------------------------------

64 125 117 450874

128 400 326 5035830

256 50 0 613939

512 25 0 6708

1024 25 0 8549

2048 25 0 147650

-------------------------------------------------------------------------------

value = 80 = 0x50 = 'P'

At that point network access is dead. Save_restore begins failing, etc.

value = 80 = 0x50 = 'P'

ioc13lab> save_restore:write_it: fprintf returned -1. [111019-131137]

../save_restore.c(1369): [0x2f000b]=write_it:S_rpcLib_RPC_CANTDECODEARGS

save_restore:write_it: fclose('auto_settings.sav') returned -1

../save_restore.c(1485): [0x2f000b]=write_it:S_rpcLib_RPC_CANTDECODEARGS

save_restore:write_it: Giving up on this attempt to write 'auto_settings.sav'. [111019-131137]

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

save_restore:write_save_file: Can't write save file. [111019-131137]

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

________________________________________

From: Mark Rivers

Sent: Tuesday, October 18, 2011 6:19 PM

To: [email protected]

Subject: EZCA and ezcaIDL improvements?

Folks,

Andrew Johnson thought he remembered someone at the recent EPICS collaboration meeting talking about improvements to the EZCA extension to use 3.14 preemptive callbacks. Has someone done some work on this?

Here’s why I want to know:

Thanks to help from Andrew I just discovered that the IDL interface (and probably other scripting languages like Matlab, etc.) can cause serious problems with an EPICS IOC. The problem arises when the IDL client sets monitors on a PV (with caSetMonitor()), but then does not call ca_pend_event() frequently. ca_pend_event() is called by the “caGet”, “caPut”, “cacheckMonitor”, etc. functions in IDL. But if one calls caSetMonitor() and then just sits at the IDL prompt, ca_pend_event() is not being called. When that happens the CA server is trying to send monitor events to the client. These initially queue up on the client side, but eventually they begin to queue up on the server side. As they queue on a vxWorks server they consume mbufs. I have found that a single client in this state will consume all of the 256 byte clusters, and some of the 512 byte clusters. If a few more clients are in this state then there are no mbufs available above a certain size. When that happens no new CA connections can be made, other network services fail, etc.

I think this problem could be avoided if we changed the ezca C library that IDL calls to use preemptive callbacks, so that periodic calls to ca_pend_event are no longer needed.

Thanks,

Mark

References:: EZCA and ezcaIDL improvements? Mark Rivers; RE: EZCA and ezcaIDL improvements? Mark Rivers

Navigate by Date:: Prev: EtherIP device support for stringout record William Lu; Next: StreamDevice: Input "1" does not match format %1r Martin Konrad; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: Re: EZCA and ezcaIDL improvements? Andrew Johnson; Next: cagateway: building with CMake, creating Debian/RPM packages Martin Konrad; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

ANJ, 18 Nov 2013

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System