EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: EZCA and ezcaIDL improvements?
From: "Mark Rivers" <[email protected]>
To: <[email protected]>
Date: Fri, 28 Oct 2011 16:37:13 -0500

Folks,

 

I just found one of our vxWorks IOCs had drained its 256-byte mbufs.  inetstatShow gave me IP address of the client that had filled up the sendQ on the server.  It turns out that machine was NOT running IDL.  Rather medm had crashed due to X errors, but the medm process was still running.  That led to the same problem as IDL, i.e. the client (medm) had requested callbacks, but was no longer processing them.  Killing the medm process fixed the problem and the mbufs recovered.

 

This was on a Windows 7 64-bit machine where we are having occasional problems with Exceed errors with medm.  But I think the same thing could happen on Linux or other systems as well if the application generates an error but does not actually exit and close all sockets.

 

Mark

 

 

________________________________________

From: Mark Rivers

Sent: Friday, October 21, 2011 4:28 PM

To: '[email protected]'

Subject: RE: EZCA and ezcaIDL improvements?

 

Folks,

 

I have verified that indeed previous versions of the ezca library used by IDL, Matlab, and other scripting languages could easily cause vxWorks IOCs to die because of mbuf depletion.  I have appended the output of the vxWorks IOC that demonstrates this.  It happens when IDL enables monitors on some PVs and then does not periodically call ca_pend_event.  This probably most often happens when one launches a GUI like scanSee or the MCA display program, then closes the GUI without closing IDL.  The monitor messages initially queue up on the client, but eventually queue up on the vxWorks IOC.  A single IDL client doing this waa not enough to kill the IOC, but 2 or 3 IDL clients were, and it happened within a few minutes.

 

The fix is to use a new version of ezca that is built to use preemptive callbacks with Channel Access 3.14.  Till Straumann had already made the needed changes to ezca.c for use with his LabCA module.  I have taken Till’s version and put it into the EPICS Extensions CVS repository at the APS.  Using this new version of ezca I have built new versions of the ezcaIDL shareable library for 32-bit and 64-bit architectures on both Linux and Windows.

 

The new DLLs can be found (along with other binaries from EPICS base 3-14.12.1) here:

 

http://cars9.uchicago.edu/software/pub/EPICS_linux-x86_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_linux-x86_64_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_win32-x86_binaries.zip

http://cars9.uchicago.edu/software/pub/EPICS_windows-x64_binaries.zip

 

Users who want to build the source code should be able to obtain ezca, EzcaScan and ezcaIDL from the CVS snapshots here:

http://www.aps.anl.gov/epics/download/extensions/index.php#cvs

 

However, it appears they are not being updated, so we will work on that.

 

There is a problem with generating the required undecorated function names in ezcaIDL.dll consistently for 32-bit and 64-bit Windows systems.  For now there will be lots of warning messages when building for 64-bits, but it does work fine.

 

I have verified that when using these new versions the problem of mbuf depletion does not occur because the preemptive callbacks on the client continue to process the monitor events in the background even when IDL is not doing anything.

 

Till added a new “count” argument to ezcaSetMonitor.  I have added that to ezcaIDL.c and ezcaIDL.pro.  It defaults to 0, so existing code will work as it previously did.

 

I have also changed the way that ezcaIDL.pro finds the shareable library.  This is from ezcaIDL/README.  Previously only mechanism 1) below was supported.

 

1) First, it looks for the environment variable EZCA_IDL_SHARE.  If this envionment

variable exists it must point to the complete path to ezcaIDL.dll (Windows) or

libezcaIDL.so (all other architectures).  For example:

 

setenv EZCA_IDL_SHARE /usr/local/lib/libezcaIDL.so

 

2) If the shareable library is not found from the EZCA_IDL_SHARE environment variable

then it is searched for in the !IDL_PATH.  The file name searched for is the

following:

 

base_!VERSION.OS_!VERSION.ARCH

 

base is either ezcaIDL.dll (Windows) or libezcaIDL.so (all other architectures).

!VERSION.OS can be "linux", "Win32", etc.  !VERSION.ARCH can be "x86", "x86_64", etc.

Do "help, /structure, !VERSION" in IDL to see the values for your system.

 

This mechanism can be more convenient than using EZCA_IDL_SHARE because it allows

easy switching between 32 and 64 bit versions of IDL, and does not require setting an

environment variable.  It does require either copying and renaming the files to a

location in the !IDL_PATH, or creating soft links in the !IDL_PATH to point to the

locations of the actual shareable libraries.

 

On our system I created soft links in idl_user/epics/, which is in the !IDL_PATH:

 

lrwxrwxrwx 1 epics epics    53 Oct 21 14:32 ezcaIDL.dll_Win32_x86 -> /usr/local/epics/extensions/bin/win32-x86/ezcaIDL.dll

lrwxrwxrwx 1 epics epics    55 Oct 21 14:32 ezcaIDL.dll_Win32_x86_64 -> /usr/local/epics/extensions/bin/windows-x64/ezcaIDL.dll

lrwxrwxrwx 1 epics epics    55 Oct 21 14:35 libezcaIDL.so_linux_x86 -> /usr/local/epics/extensions/lib/linux-x86/libezcaIDL.so

lrwxrwxrwx 1 epics epics    58 Oct 21 14:36 libezcaIDL.so_linux_x86_64 -> /usr/local/epics/extensions/lib/linux-x86_64/libezcaID

 

I suggest that anyone who is using IDL to communicate with EPICS PVs in vxWorks IOCs upgrade to the new version of these DLLs.

 

Cheers,

Mark

 

Here is a demonstration of the problem with the previous version.  This is a vxWorks IOC before setting the monitor callbacks in IDL. 164.54.160.82 is the IP address of the Linux client where 3 IDL sessions are running.

 

ioc13lab> mbufShow

type        number

---------   ------

FREE    :    788

DATA    :      5

HEADER  :      7

SOCKET  :      0

PCB     :      0

RTABLE  :      0

HTABLE  :      0

ATABLE  :      0

SONAME  :      0

ZOMBIE  :      0

SOOPTS  :      0

FTABLE  :      0

RIGHTS  :      0

IFADDR  :      0

CONTROL :      0

OOBDATA :      0

IPMOPTS :      0

IPMADDR :      0

IFMADDR :      0

MRTABLE :      0

TOTAL   :    800

number of mbufs: 800

number of times failed to find space: 0

number of times waited for space: 0

number of times drained protocols for space: 0

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size     clusters  free      usage

-------------------------------------------------------------------------------

64       125       117       450510       

128      400       400       4990010      

256      50        44        589498       

512      25        25        6297         

1024     25        25        7374         

2048     25        25        146726       

-------------------------------------------------------------------------------

value = 80 = 0x50 = 'P'

 

ioc13lab> inetstatShow

Active Internet connections (including servers)

PCB      Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)

-------- ----- ------ ------  ------------------ ------------------ -------

1da89d4  TCP        0      0  164.54.160.73.5064    164.54.160.82.40702 

1da8950  TCP        0      0  164.54.160.73.5064    164.54.160.82.40680 

1da88cc  TCP        0      0  164.54.160.73.5064    164.54.160.82.40674 

1da8848  TCP        0      0  164.54.160.73.5064    164.54.160.82.40662 

1da87c4  TCP        0      0  164.54.160.73.5064    164.54.160.82.40650 

1da8740  TCP        0      0  164.54.160.73.5064    164.54.160.74.1123  

1da8428  TCP        0      0  164.54.160.73.5064    164.54.160.130.2328 

1da7f84  TCP        0      0  0.0.0.0.5064          0.0.0.0.0           

1da7cf0  TCP        0      0  0.0.0.0.111           0.0.0.0.0           

1da86bc  UDP        0      0  0.0.0.0.989           0.0.0.0.0           

1da8638  UDP        0      0  164.54.160.73.1038    0.0.0.0.0           

1da85b4  UDP        0      0  127.0.0.1.1037        127.0.0.1.1031      

1da8530  UDP        0      0  127.0.0.1.1036        127.0.0.1.1030      

1da84ac  UDP        0      0  0.0.0.0.5065          0.0.0.0.0            

1da83a4  UDP        0      0  0.0.0.0.991           0.0.0.0.0           

1da7df8  UDP        0      0  0.0.0.0.1034          0.0.0.0.0           

1da8320  UDP        0      0  0.0.0.0.1033          0.0.0.0.0           

1da829c  UDP        0      0  0.0.0.0.1032          0.0.0.0.0           

1da8218  UDP        0      0  0.0.0.0.1031          0.0.0.0.0           

1da8194  UDP        0      0  0.0.0.0.1030          0.0.0.0.0           

1da8110  UDP        0      0  0.0.0.0.1029          0.0.0.0.0            

1da808c  UDP        0      0  0.0.0.0.5064          0.0.0.0.0           

1da8008  UDP        0      0  164.54.160.73.1035    164.54.160.255.5065 

1da7f00  UDP        0      0  0.0.0.0.1028          0.0.0.0.0           

1da7e7c  UDP        0      0  0.0.0.0.1027          0.0.0.0.0           

1da7d74  UDP        0      0  0.0.0.0.1025          0.0.0.0.0           

1da7ae0  UDP        0      0  0.0.0.0.111           0.0.0.0.0           

value = 1 = 0x1

 

 

This is the output of netstat on the client Linux machine as it local TCP buffers fill up:

 

corvette:~>netstat | grep ioc13lab

tcp   2971200      0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   2947200      0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   2994400      0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>more /proc/sys/net/ipv4/tcp_rmem

4096    87380   4194304

corvette:~>netstat | grep ioc13lab

tcp   3013800      0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   2989720      0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   3037000      0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>

corvette:~>netstat | grep ioc13lab

tcp   3419400      0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   3427280      0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   3406800      0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

corvette:~>netstat | grep ioc13lab

tcp   3419400      0 corvette.cars.aps.anl:40674 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40650 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   3427280      0 corvette.cars.aps.anl:40662 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp        0      0 corvette.cars.aps.anl:40702 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

tcp   3406800      0 corvette.cars.aps.anl:40680 ioc13lab.cars.aps.anl.:ca-1 ESTABLISHED

 

Note that at this point 3 connections from corvette to ioc13lab have filled their receive buffers, and they are no longer getting any larger.

 

At that point the sendQ buffers on the vxWorks system begin filling up.  

 

 

ioc13lab> inetstatShow

Active Internet connections (including servers)

PCB      Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)

-------- ----- ------ ------  ------------------ ------------------ -------

1da89d4  TCP        0      0  164.54.160.73.5064    164.54.160.82.40702 

1da8950  TCP        0   8192  164.54.160.73.5064    164.54.160.82.40680 

1da88cc  TCP        0   2600  164.54.160.73.5064    164.54.160.82.40674 

1da8848  TCP        0   7760  164.54.160.73.5064    164.54.160.82.40662 

1da87c4  TCP        0      0  164.54.160.73.5064    164.54.160.82.40650 

1da8740  TCP        0      0  164.54.160.73.5064    164.54.160.74.1123  

1da8428  TCP        0      0  164.54.160.73.5064    164.54.160.130.2328 

1da7f84  TCP        0      0  0.0.0.0.5064          0.0.0.0.0           

1da7cf0  TCP        0      0  0.0.0.0.111           0.0.0.0.0           

1da86bc  UDP        0      0  0.0.0.0.989           0.0.0.0.0           

1da8638  UDP        0      0  164.54.160.73.1038    0.0.0.0.0           

1da85b4  UDP        0      0  127.0.0.1.1037        127.0.0.1.1031      

1da8530  UDP        0      0  127.0.0.1.1036        127.0.0.1.1030      

1da84ac  UDP        0      0  0.0.0.0.5065          0.0.0.0.0           

1da83a4  UDP        0      0  0.0.0.0.991           0.0.0.0.0           

1da7df8  UDP        0      0  0.0.0.0.1034          0.0.0.0.0           

1da8320  UDP        0      0  0.0.0.0.1033          0.0.0.0.0           

1da829c  UDP        0      0  0.0.0.0.1032          0.0.0.0.0           

1da8218  UDP        0      0  0.0.0.0.1031          0.0.0.0.0           

1da8194  UDP        0     

 

 

That in turn depletes the mbufs.  Note that there are no free mbufs larger than 256 bytes.

 

ioc13lab> mbufShow   

type        number

---------   ------

FREE    :    592

DATA    :    201

HEADER  :      7

SOCKET  :      0

PCB     :      0

RTABLE  :      0

HTABLE  :      0

ATABLE  :      0

SONAME  :      0

ZOMBIE  :      0

SOOPTS  :      0

FTABLE  :      0

RIGHTS  :      0

IFADDR  :      0

CONTROL :      0

OOBDATA :      0

IPMOPTS :      0

IPMADDR :      0

IFMADDR :      0

MRTABLE :      0

TOTAL   :    800

number of mbufs: 800

number of times failed to find space: 0

number of times waited for space: 0

number of times drained protocols for space: 0

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size     clusters  free      usage

-------------------------------------------------------------------------------

64       125       117       450874       

128      400       326       5035830      

256      50        0         613939       

512      25        0         6708         

1024     25        0         8549         

2048     25        0         147650       

-------------------------------------------------------------------------------

value = 80 = 0x50 = 'P'

 

At that point network access is dead.  Save_restore begins failing, etc.

 

value = 80 = 0x50 = 'P'

ioc13lab> save_restore:write_it: fprintf returned -1. [111019-131137]

../save_restore.c(1369): [0x2f000b]=write_it:S_rpcLib_RPC_CANTDECODEARGS

save_restore:write_it: fclose('auto_settings.sav') returned -1

../save_restore.c(1485): [0x2f000b]=write_it:S_rpcLib_RPC_CANTDECODEARGS

save_restore:write_it: Giving up on this attempt to write 'auto_settings.sav'. [111019-131137]

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

save_restore:write_save_file: Can't write save file. [111019-131137]

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

 

 

 

 

________________________________________

From: Mark Rivers

Sent: Tuesday, October 18, 2011 6:19 PM

To: [email protected]

Subject: EZCA and ezcaIDL improvements?

 

Folks,

 

Andrew Johnson thought he remembered someone at the recent EPICS collaboration meeting talking about improvements to the EZCA extension to use 3.14 preemptive callbacks.  Has someone done some work on this?

 

Here’s why I want to know:

 

Thanks to help from Andrew I just discovered that the IDL interface (and probably other scripting languages like Matlab, etc.) can cause serious problems with an EPICS IOC.  The problem arises when the IDL client sets monitors on a PV (with caSetMonitor()), but then does not call ca_pend_event() frequently.  ca_pend_event() is called by the “caGet”, “caPut”, “cacheckMonitor”, etc. functions in IDL.  But if one calls caSetMonitor() and then just sits at the IDL prompt, ca_pend_event() is not being called.  When that happens the CA server is trying to send monitor events to the client.  These initially queue up on the client side, but eventually they begin to queue up on the server side.  As they queue on a vxWorks server they consume mbufs.  I have found that a single client in this state will consume all of the 256 byte clusters, and some of the 512 byte clusters.  If a few more clients are in this state then there are no mbufs available above a certain size.  When that happens no new CA connections can be made, other network services fail, etc.

 

I think this problem could be avoided if we changed the ezca C library that IDL calls to use preemptive callbacks, so that periodic calls to ca_pend_event are no longer needed.

 

Thanks,

 

Mark

 

 


References:
EZCA and ezcaIDL improvements? Mark Rivers
RE: EZCA and ezcaIDL improvements? Mark Rivers

Navigate by Date:
Prev: EtherIP device support for stringout record William Lu
Next: StreamDevice: Input "1" does not match format %1r Martin Konrad
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: EZCA and ezcaIDL improvements? Andrew Johnson
Next: cagateway: building with CMake, creating Debian/RPM packages Martin Konrad
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·