EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: burtrb failing on local LAN, but not through gateway
From: "Hill, Jeff" <[email protected]>
To: Keith Thorne <[email protected]>, "[email protected]" <[email protected]>
Date: Fri, 19 Oct 2012 00:34:22 +0000
> if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second

If there are lost UDP frames the client side will back off
on the search rate too avoid overwhelming the IOC or the 
LAN.

Jeff

> -----Original Message-----
> From: Keith Thorne [mailto:[email protected]]
> Sent: Thursday, October 18, 2012 1:56 PM
> To: [email protected]
> Cc: Hill, Jeff
> Subject: Re: burtrb failing on local LAN, but not through gateway
> 
> Dear Jeff
>       Thanks for all the detailed debugging suggestions. So far I have
> found that if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second.  What is not working is
> burt's connection callback scheme.
> It should be decrementing a counter as each PV is connected.
> However the count proceeds (at the 1-second stride) if 15,000 connections
> attempted (example numbers only)
> 
> 15000 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> ....
> 
> so it is not decrementing correctly.  I'll have to add in some debugging
> to figure out how to fix it.
> After that, I figure out what the bottleneck is
> 
> 				Keith Thorne
> 					K
> On Oct 17, 2012, at 1:32 PM, Hill, Jeff wrote:
> 
> > Hi Keith,
> >
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels.
> >
> > One possibility might be that the Ubuntu system is running low on some
> > resource such as file descriptors, threads, and or network buffers. If
> this
> > was the cause you probably would see see other, unrelated to EPICS,
> programs
> > struggling during startup and or circuit initiation when they are
> running also
> > on the resource limited host.
> >
> > To debug this situation, I would set the burt connection establishment
> timeouts
> > to be very large, and then run some diagnostics, at the same time  burt
> is
> > having troubles.
> >
> > 0) Do you see that its always the same set of IOCs that burt has
> troubles
> > with, or does the situation move around between different IOCs? The
> later
> > points more towards a client side code or resource consumption issue,
> and
> > the former points more towards a server side code or resource
> consumption
> > issue.
> >
> > 1) look at tcp circuits on the Linux systems where the burt client, and
> > soft IOCs, are running. I would use a command like "netstat | grep 5064"
> > to look at circuits with the CA default port in the server side circuit
> > endpoint. You will be looking to see if too many of the TCP circuits
> have
> > their state machines remaining in a cleanup, and or connection startup,
> > state.
> >
> > 2) Does command line caget, running on the same or different host from
> burt,
> > have troubles when connecting to the same IOC that burt is having
> problems
> > connecting with. Run this test at the same time that burt is having
> problems.
> >
> > 3) I would also have a look at the casr, on the soft IOCs command line.
> > There is an interest level parameter with this command which can be used
> > to obtain more information about what is happening.
> >
> > 4) You might also try running the diagnostic dumping function in the ca
> > client API against the ca client context for the burt program. This can
> > be called from the debugger or burt could be modified to call this
> function,
> > after some timeout, when the channels are not connecting. You can
> increase the
> > interest level to get additional details about the state of the ca
> client library.
> >
> > 5) Also have a look at the CPU meter on the hosts where the soft IOCs
> are
> > running. If the server is starved for CPU it won't connect to the
> client.
> >
> > 6) Watch out for too many ICMP error replies competing with the UDP
> search
> > reply from the IOC. This has been a problem typically only with old IP
> kernel's
> > that incorrectly replied to a UDP broadcast with an ICMP error message.
> To
> > diagnose this type of problem problem you will need an Ethernet
> monitoring ]
> > program like tcpdump.
> >
> > 7) If a particular IOC is known to be experiencing problems, with being
> > connected to, then I would attach gdb and type "thread apply all bt".
> The
> > output is large, but I can use this type of output to determine if there
> > is a problem in the code. Same thing can be done with the burt client
> process.
> >
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels
> >
> > I do connect to large numbers of channels in the CA client side
> regression
> > tests.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:tech-talk-
> [email protected]]
> >> On Behalf Of Keith Thorne
> >> Sent: Wednesday, October 17, 2012 10:50 AM
> >> To: [email protected]
> >> Subject: burtrb failing on local LAN, but not through gateway
> >>
> >> We use the burt extension heavily for backup and restoration of
> settings
> >> on Linux-based IOCs.
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels when run on the same LAN as the computer with the
> IOC.
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels.  Interestingly, the same burtrb call works just fine when run
> >> from a computer on a different LAN where an EPICS gateway is used to
> >> provide access.
> >>
> >> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
> >>
> >> We tried increasing the number of retries, but all that does is
> increase
> >> the wait time (in 1 second increments) for checking that all
> connections
> >> are made.  This does not work.
> >>
> >> The Burt extension (as well as the casr extension) have not be updated
> for
> >> 3.14.x.  Are there suggestions for an alternate snapshot/restore tool
> that
> >> is being updated?
> >>
> >> Thanks
> >> 	Keith Thorne
> >> 	LIGO Livingston Observatory



References:
burtrb failing on local LAN, but not through gateway Keith Thorne
RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
Re: burtrb failing on local LAN, but not through gateway Keith Thorne

Navigate by Date:
Prev: Re: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: JCA problems and questions Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: burtrb failing on local LAN, but not through gateway Keith Thorne
Next: ISA 5.1. CAD tool Jeřábek Jiří
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·