EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  <20082009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  <20082009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Inter-IOC link problems
From: "Jeff Hill" <[email protected]>
To: "'Shepherd, EL \(Emma\)'" <[email protected]>, <[email protected]>
Date: Mon, 20 Oct 2008 10:24:25 -0600
Presumably, the IP stack on this IOC is operating correctly when this
happens - as verified by {telnet, ping, ifShow, ...}?

When this occurs, you might try running a small standalone CA client that
you have dynamically loaded into vxWorks. Its best to spawn this type of
client so that a CA context will not end up getting attached to the vxWorks
shell. The intent of course would be to isolate between a global CA issue,
and one that is isolated to the CA client / DB CA Link code combination. 

I would also look very closely at the output from dbcar at higher interest
levels. As the interest level increases you should be able to see if CA
thinks that the channel is connected or not (the output from void nciu::show
()). Of particular interest would be any situations where CA thinks the
channel is connected, but the DB CA link code does not. Also look for
situations where the DB CA Link code thinks that it's a CA link, but the CA
channel hasn't been created (yet).

Also, do a "tt" on the DBCA Link thread, and the satellite threads for its
CA context. Look for any situations where threads are hanging around in
unusual places which might indicate some form of deadlock. If you see
anything out of the ordinary please send the tt output and I will have a
look. In lightly loaded situations, "out of the ordinary" usually means a
thread that isn't parked in the normal place (as seen by snapshots with tt)
for an extended length of time. One of course needs to compare tt output
from when the IOC is normal to tt output from when the IOC is misbehaving.
Needless to say, a CPU starvation situation on this IOC would also cause
issues (could be the cause of your issue).

In the past, quite some years back actually, I have seen UDP issues if there
were too many machines on a network with the wrong subnet mask
configuration. I think that there used to be some issues in particular with
HP workstations because they would reply with "ICMP network unreachable" if
their network mask was set incorrectly and this could cause the IOC's search
response to be discarded off the end of the finite length UDP input queue
(depending on which response got there first and how many bogus ICMP
messages are sent in response to each search request). ICMP traffic can be
seen with Ethernet snoopers like wireshark or tcpdump. However, on modern
switched networks, it may be best to be on the same hub (not a switch) with
the IOC so that you can see unicast traffic that the switch sends only
between the IOC and its message peers. Admittedly, this is perhaps
contraindicated based on your not seeing any search traffic from the IOC in
casnooper. 

You might have a look at the output from utpStatShow (presuming that
something is wrong with UDP and not IP). Also, have a look at ifShow and
verify that the broadcast address remains correctly configured, and that
there are not high error rates.

Jeff

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Shepherd, EL (Emma)
> Sent: Friday, October 17, 2008 9:12 AM
> To: [email protected]
> Subject: RE: Inter-IOC link problems
> 
> I've done a little more investigation and I think that in this case the
> gateway is not to blame.  It seems that other CA links on this IOC are
> also not working, and they are not all going through the gateway (some
> are on other IOCs on the same network).
> 
> I setup caSnooper to monitor connection requests on one of the PVs my
> IOC is failing to link to.  When I change the link to a constant and
> change it back again, caSnooper does not report any new requests for the
> PV.  However when I do the same on a 'healthy' IOC which has working
> links, I see the new request on caSnooper when I put the link back.
> 
> I'm not sure what that tells me except that it looks like the IOC has
> somehow stopped broadcasting search requests..?
> 
> Emma
> 
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]] On Behalf Of Shepherd,
> > EL (Emma)
> > Sent: 17 October 2008 12:28
> > To: Ralph Lange
> > Cc: [email protected]
> > Subject: RE: Inter-IOC link problems
> >
> >
> > Hi there,
> >
> > Thanks for the replies, it seems that the 'undefined' entry
> > might have been a red herring.
> >
> > The IOC I am looking at is the client of the PV connection,
> > and the IP address listed is the server side of the CA
> > gateway.  There are in fact two gateways on this machine -
> > one for each direction as you suggested. The configuration is
> > really very simple, it is setup to allow read access for all
> > PVs.  Do you need to know anything more specific?
> >
> > Cheers,
> >
> > Emma
> >
> > > -----Original Message-----
> > > From: Ralph Lange [mailto:[email protected]]
> > > Sent: 17 October 2008 08:52
> > > To: Shepherd, EL (Emma)
> > > Cc: [email protected]
> > > Subject: Re: Inter-IOC link problems
> > >
> > >
> > > Hi Emma,
> > >
> > > I would need a bit more information about your setup to be
> > > able to fully
> > > understand your report.
> > >
> > > You are looking at the CA client side of an IOC. When you are losing
> > > connections between IOCs, is the IOC you're looking at the
> > > server or the
> > > client of that PV connection?
> > > It seems there are no beacons coming from the CA Gateway
> > > (172.23.106.35). Is that the client side or the server side
> > of the CA
> > > Gateway? Are two (or more) Gateway processes running on
> > that machine
> > > (i.e. one for each direction)? What is the CA configuration for the
> > > Gateway(s) on that machine?
> > >
> > > CA configuration of a Gateway is difficult and subtle. There
> > > are a lot
> > > of environment variables for CA server and client (see the
> > CA Manual)
> > > which influence the behaviour of a CA application. Some
> > variables are
> > > using other variables' values as default, which simplifies
> > > configuration
> > > of pure CA client or server apps, but may lead to unwanted
> > > behaviour for
> > > a CA Gateway (whis is one of the few apps that is as well CA
> > > server and
> > > client). E.g, it is quite easy to create a setup where the
> > Gateway is
> > > sending out beacons on the wrong (i.e. client) side.
> > >
> > > Cheers,
> > > Ralph
> > >
> > >
> > > Shepherd, EL (Emma) wrote:
> > > > Hi all,
> > > >
> > > > We still seem to suffer quite a bit from problems with
> > > database links
> > > > between IOCs, particularly when a gateway is involved.  For some
> > > > reason the links can become disconnected and a reboot is usually
> > > > necessary to get them working again.  I have just had an
> > > opportunity
> > > > to do some diagnosis on one such problem and found a clue
> > in the CA
> > > > beacon hashtable part of the dbcar report.  The entry for
> > > the gateway
> > > > (172.23.106.35) is 'undefined', although the gateway itself
> > > seems to
> > > > be working just fine and I can use caget through the gateway as
> > > > normal.
> > > >
> > > > Any ideas what could cause this to happen, or how to fix
> > it when it
> > > > does?  None of the tasks are suspended, CPU usage is low and
> > > > everything else looks fine.
> > > >
> > > > CA beacon hash entry for 172.23.106.32:5064 with period estimate
> > > > 15.000521
> > > >         beacon number 168436, on THU OCT 16 2008 14:27:46
> > > > CA beacon hash entry for 172.23.106.35:5064 <no period estimate>
> > > >         beacon number 0, on <undefined>
> > > > CA beacon hash entry for 172.23.106.97:5064 with period estimate
> > > > 14.988265
> > > >         beacon number 76356, on THU OCT 16 2008 14:27:52
> > > > CA beacon hash entry for 172.23.106.96:5064 with period estimate
> > > > 14.988637
> > > >         beacon number 39491, on THU OCT 16 2008 14:27:53
> > > > CA beacon hash entry for 172.23.106.98:5064 with period estimate
> > > > 14.980477
> > > >         beacon number 58989, on THU OCT 16 2008 14:27:47
> > > > CA beacon hash entry for 172.23.106.102:5064 with period estimate
> > > > 14.990867
> > > >         beacon number 39993, on THU OCT 16 2008 14:27:53
> > > > CA beacon hash entry for 172.23.106.32:5064 with period estimate
> > > > 15.000521
> > > >         beacon number 168436, on THU OCT 16 2008 14:27:46
> > > > CA beacon hash entry for 172.23.106.35:5064 <no period estimate>
> > > >         beacon number 0, on <undefined>
> > > > CA beacon hash entry for 172.23.106.97:5064 with period estimate
> > > > 14.988265
> > > >         beacon number 76356, on THU OCT 16 2008 14:27:52
> > > > CA beacon hash entry for 172.23.106.96:5064 with period estimate
> > > > 14.988637
> > > >         beacon number 39491, on THU OCT 16 2008 14:27:53
> > > > CA beacon hash entry for 172.23.106.98:5064 with period estimate
> > > > 14.980477
> > > >         beacon number 58989, on THU OCT 16 2008 14:27:47
> > > > CA beacon hash entry for 172.23.106.102:5064 with period estimate
> > > > 14.990867
> > > >         beacon number 39993, on THU OCT 16 2008 14:27:53
> > > entries per
> > > > bucket: mean = 0.011719 std dev = 0.107617 max = 1
> > > >
> > > >
> > > > Thanks in advance....
> > > >
> > > > Emma
> > > >
> > >
> > <DIV><FONT size="1" color="gray">This e-mail and any
> > attachments may contain confidential, copyright and or
> > privileged material, and are for the use of the intended
> > addressee only. If you are not the intended addressee or an
> > authorised recipient of the addressee please notify us of
> > receipt by returning the e-mail and do not use, copy, retain,
> > distribute or disclose the information in or attached to the
> > e-mail. Any opinions expressed within this e-mail are those
> > of the individual and not necessarily of Diamond Light Source Ltd.
> > Diamond Light Source Ltd. cannot guarantee that this e-mail
> > or any attachments are free from viruses and we cannot accept
> > liability for any damage which you may sustain as a result of
> > software viruses which may be transmitted in or with the
> > message. Diamond Light Source Limited (company no. 4375679).
> > Registered in England and Wales with its registered office at
> > Diamond House, Harwell Science and Innovation Campus, Didcot,
> > Oxfordshire, OX11 0DE, United Kingdom </FONT></DIV>
> >
> >
> <DIV><FONT size="1" color="gray">This e-mail and any attachments may
> contain confidential, copyright and or privileged material, and are for
the
> use of the intended addressee only. If you are not the intended addressee
> or an authorised recipient of the addressee please notify us of receipt by
> returning the e-mail and do not use, copy, retain, distribute or disclose
> the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> </FONT></DIV>


References:
RE: Inter-IOC link problems Shepherd, EL (Emma)
RE: Inter-IOC link problems Shepherd, EL (Emma)

Navigate by Date:
Prev: Re: EPICS and CAN field bus Marty Kraimer
Next: Re: EPICS and CAN field bus Graham Waters
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  <20082009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Inter-IOC link problems Shepherd, EL (Emma)
Next: Re: Inter-IOC link problems Ralph Lange
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  <20082009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·