EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: CA monitors...Update
From: [email protected] (Jeff Hill)
To: "Garrett D. Rinehart" <[email protected]>, <[email protected]>
Date: Fri, 10 Dec 1999 11:15:45 -0700
Garrett,

I think I understand correctly that when the stalling situation occurs
that clients on multiple hosts stop receiving monitors. If so, I think
that we can safely assume that this is a server side problem or a network
problem.

Since your situation is quite repeatable and consistent I propose that
you perform the following steps in order to help isolate the problem.

1) do what ever is necessary to maximize the severity of the problem
2) type "dbel <record name>" for the records that are stalled. Let
me know if it indicates that some of the monitor subscriptions are behind.
3) type "inetstatShow" on the IOC. Let me know if there are TCP virtual
circuits that consistently have a large number of bytes lingering in the
"Send-Q".
The UNIX equivalent of this command is "netstat", but you will be looking
for bytes lingering in the receive queue.
3) type "casr <interest level>" on the stalling IOC. Look for client
connections
that have a long delay since the last send when you expect that there should
be
regular monitor updates. Host names can be converted to IP addresses on UNIX
with "nslookup".
4) type "i" on the IOC that is stalled and then send the output from
"tt 0x<task id>" for several of the CA event tasks. The event task id can
be determined by running "dbel <record name>".
I will be looking for tasks that remain consistently over several
calls to tt (task stack trace) in the same unusual part of the code.
Compare the tt output from normal IOCs with the tt output from the
stalling IOC. Also compare the tt output for the netTask on the stalling
IOC to one that is behaving normally.
5) Finally, I am also interested in the output from typing "mbufShow",
"ifShow", and "tcpstatShow" on the IOC.

>
> Grasping at straws, I started killing MEDM displays on the
> control workstations. One of them
> was an XY plot of beam profiles (~75 channels total @ 30Hz rate).
> Turning that
> display on and off had almost the same effect on the stalling
> problem (turned
> on, big problem; turned off, almost no problem).

How many data points are in each beam profile channel? Also,
does stopping a large number of other displays attached to the
stalling IOC while allowing the XY plot of beam profiles to
continue running have any effect?

>
> About that time, as I was preparing to answer difinitively, "the
> network is
> overloaded", the network sniffer was made available. Hooking it
> up, we found no
> errors, an incredibly small number of collisions (about 0.001%),
> and only 10%
> network utilization (peak).
>
> Meanwhile, my little box with the LEDs that adapts between the
> "thin" section of
> the network and the 10/100baseT hub is still showing collisions
> galore.

Was the sniffer attached to the thinwire segment of the network, or
was it attached to a different tap off of the switch? Due to the
nature of switches, sniffers will generally not see traffic
(broadcasts are an exception) on other parts of the switched network.

Do you see any unusual errors when you run "ifShow" and "ipstatShow" on the
IOC that appears to be stalling? Do you see any unusual errors when you
run "netstat -s" on the UNIX workstation.

>
> Oh, BTW, Marty or Jeff asked earlier if it was a switched
> network. Yes, it is. I
> have three workstations, a printer, and a couple of PCs running off a
> 10/100baseT hub. Then there's all the iocs remaining on the old
> thinwire which
> is attached through an adapter box to hub.
>
> 5) Won't somebody send me an aspirin?
>

ditto

Jeff



References:
Re: CA monitors...Update Garrett D. Rinehart

Navigate by Date:
Prev: RE: Delays in receipt of CA monitors Jeff Hill
Next: Re: HP VXI Monitoring Dave Gurd
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: CA monitors...Update Garrett D. Rinehart
Next: HP crate driver? plc
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·