> if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second
If there are lost UDP frames the client side will back off
on the search rate too avoid overwhelming the IOC or the
LAN.
Jeff
> -----Original Message-----
> From: Keith Thorne [mailto:[email protected]]
> Sent: Thursday, October 18, 2012 1:56 PM
> To: [email protected]
> Cc: Hill, Jeff
> Subject: Re: burtrb failing on local LAN, but not through gateway
>
> Dear Jeff
> Thanks for all the detailed debugging suggestions. So far I have
> found that if you wait long enough, all the connections are made.
> Rate is about 1,500-2,000 channels per second. What is not working is
> burt's connection callback scheme.
> It should be decrementing a counter as each PV is connected.
> However the count proceeds (at the 1-second stride) if 15,000 connections
> attempted (example numbers only)
>
> 15000 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> 14300 pending
> ....
>
> so it is not decrementing correctly. I'll have to add in some debugging
> to figure out how to fix it.
> After that, I figure out what the bottleneck is
>
> Keith Thorne
> K
> On Oct 17, 2012, at 1:32 PM, Hill, Jeff wrote:
>
> > Hi Keith,
> >
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels.
> >
> > One possibility might be that the Ubuntu system is running low on some
> > resource such as file descriptors, threads, and or network buffers. If
> this
> > was the cause you probably would see see other, unrelated to EPICS,
> programs
> > struggling during startup and or circuit initiation when they are
> running also
> > on the resource limited host.
> >
> > To debug this situation, I would set the burt connection establishment
> timeouts
> > to be very large, and then run some diagnostics, at the same time burt
> is
> > having troubles.
> >
> > 0) Do you see that its always the same set of IOCs that burt has
> troubles
> > with, or does the situation move around between different IOCs? The
> later
> > points more towards a client side code or resource consumption issue,
> and
> > the former points more towards a server side code or resource
> consumption
> > issue.
> >
> > 1) look at tcp circuits on the Linux systems where the burt client, and
> > soft IOCs, are running. I would use a command like "netstat | grep 5064"
> > to look at circuits with the CA default port in the server side circuit
> > endpoint. You will be looking to see if too many of the TCP circuits
> have
> > their state machines remaining in a cleanup, and or connection startup,
> > state.
> >
> > 2) Does command line caget, running on the same or different host from
> burt,
> > have troubles when connecting to the same IOC that burt is having
> problems
> > connecting with. Run this test at the same time that burt is having
> problems.
> >
> > 3) I would also have a look at the casr, on the soft IOCs command line.
> > There is an interest level parameter with this command which can be used
> > to obtain more information about what is happening.
> >
> > 4) You might also try running the diagnostic dumping function in the ca
> > client API against the ca client context for the burt program. This can
> > be called from the debugger or burt could be modified to call this
> function,
> > after some timeout, when the channels are not connecting. You can
> increase the
> > interest level to get additional details about the state of the ca
> client library.
> >
> > 5) Also have a look at the CPU meter on the hosts where the soft IOCs
> are
> > running. If the server is starved for CPU it won't connect to the
> client.
> >
> > 6) Watch out for too many ICMP error replies competing with the UDP
> search
> > reply from the IOC. This has been a problem typically only with old IP
> kernel's
> > that incorrectly replied to a UDP broadcast with an ICMP error message.
> To
> > diagnose this type of problem problem you will need an Ethernet
> monitoring ]
> > program like tcpdump.
> >
> > 7) If a particular IOC is known to be experiencing problems, with being
> > connected to, then I would attach gdb and type "thread apply all bt".
> The
> > output is large, but I can use this type of output to determine if there
> > is a problem in the code. Same thing can be done with the burt client
> process.
> >
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels
> >
> > I do connect to large numbers of channels in the CA client side
> regression
> > tests.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:tech-talk-
> [email protected]]
> >> On Behalf Of Keith Thorne
> >> Sent: Wednesday, October 17, 2012 10:50 AM
> >> To: [email protected]
> >> Subject: burtrb failing on local LAN, but not through gateway
> >>
> >> We use the burt extension heavily for backup and restoration of
> settings
> >> on Linux-based IOCs.
> >> We are seeing the burtrb calls fail when invoked for a large list
> >> (~14,000) channels when run on the same LAN as the computer with the
> IOC.
> >> The application successfully completes the ca_search_and_connect calls
> for
> >> all channels, but only gets successful connections for a few hundred
> >> channels. Interestingly, the same burtrb call works just fine when run
> >> from a computer on a different LAN where an EPICS gateway is used to
> >> provide access.
> >>
> >> We are using Ubuntu 10.04 (64-bit) with EPICS 3.14.2 on the client.
> >>
> >> We tried increasing the number of retries, but all that does is
> increase
> >> the wait time (in 1 second increments) for checking that all
> connections
> >> are made. This does not work.
> >>
> >> The Burt extension (as well as the casr extension) have not be updated
> for
> >> 3.14.x. Are there suggestions for an alternate snapshot/restore tool
> that
> >> is being updated?
> >>
> >> Thanks
> >> Keith Thorne
> >> LIGO Livingston Observatory
- References:
- burtrb failing on local LAN, but not through gateway Keith Thorne
- RE: burtrb failing on local LAN, but not through gateway Hill, Jeff
- Re: burtrb failing on local LAN, but not through gateway Keith Thorne
- Navigate by Date:
- Prev:
Re: burtrb failing on local LAN, but not through gateway Keith Thorne
- Next:
JCA problems and questions Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: burtrb failing on local LAN, but not through gateway Keith Thorne
- Next:
ISA 5.1. CAD tool Jeřábek Jiří
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|