EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: CA problem
From: "Jeff Hill" <[email protected]>
To: "'Pedro Gigoux'" <[email protected]>, "'EPICS Tech-Talk'" <[email protected]>
Date: Mon, 9 Sep 2002 10:37:45 -0600
Pedro,

>From your message I am inferring the following.

1) There is a failure in your vxWorks IP kernel subsystem
or some part of your network infrastructure because ping
is no longer working.

2) A CA thread is reporting that it found that one of its
network circuits wasn't working, and therefore disconnected
by closing the network socket. Probably almost at the same
time another CA thread that was watching for incoming network
data in the select() system called reported that one of the
socket file descriptors that it was watching was closed by
the other CA thread and was therefore invalid. It was not
possible to cause the other thread to drop out of select
without closing the socket in this circumstance. I choose
at the time not to suppress the message because in other
circumstance these messages are useful if CA is failing or
its data structures have been corrupted. These messages do
not occur in EPICS R3.14.

> The only
> way to bring the system back to life was to install the old version of
> the software that run on a previous verion of EPICS (3.12.2).

3) Do I infer correctly that rebooting the R3.13.4 system did not
restore proper operation? I don't ask this question to imply
that rebooting should be a normal part of system operation.
I am only trying to understand more about your symptoms. If
rebooting does not help then this might be a large clue about
why you are intermittently losing network connectivity.
Was a network switch recently upgraded? Was a network switch
reset without powering off the systems that use it? We have
seen problems with certain network gear that had 10/100 Mb
auto negotiation if the switch lost power and an IOC interfaced
with a 10/100 auto negotiated LAN tap was not powered up after
the switch.

It is difficult to make further guesses about the cause
of the problem.

When this occurs again I would run the
following commands from the serial port in order to verify
that certain EPICS, CA, and or vxWorks data structures have
not been corrupted.

dbcar 0, 10
casr 10
dbl
iosFdShow
inetstatShow
ifShow
ipstatShow
udpstatShow
tcpstatShow
mbufShow

In particular, look carefully at the error counts reported
by ifShow and the free mbuf count reported by mbufShow.

We have also seen vxWorks IP kernel failures occurring when
there are threads in the system using the network that run
at a higher priority than tNetTask.

Did moving back to the 3.12.2 system also involve using a
different version of vxWorks? You might check to see if any
patches available for vxWorks and the network interface
driver are installed.

Jeff

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Pedro
> Gigoux
> Sent: Monday, September 09, 2002 9:10 AM
> To: EPICS Tech-Talk
> Subject: CA problem
>
> Hi,
>
> We have a system that keeps loosing the connection to the DM screens
> after a few minutes of operation. The following message shows up on
the
> screen after that happens:
>
> ecs-MK> dbCa:exceptionCallback stat Network connection lost channel
> unknown
>
> ecs-MK>
> CA.Client.Diagnostic..............................................
>     Message: "Network connection lost"
>     Severity: "Warning" Context: "10.2.2.104:5064"
>     Source File: ../iocinf.c Line Number: 1488
> ....................................................................
> CAC: unexpected select fail: 851971=S_iosLib_INVALID_FILE_DESCRIPTOR
>
> After this message it's not possible to ping the machine where the
crate
> boots from (or any other machine). Tried changing the CPU and
transition
> module but the symptom stayed the same. Also tried changing the
ethernet
> cable and switch port with no luck.
>
> Except for the network problem the system seems to be running fine and
> I'm able to access the console (serial port) without problems. The
only
> way to bring the system back to life was to install the old version of
> the software that run on a previous verion of EPICS (3.12.2). We used
to
> run under 3.13.4. The program in the crate has been running for
several
> weeks flawlesly, and runs in our twin facility with no problems at
all.
>
> Any ideas?
>
> Pedro.



Replies:
CA problem Allan Honey
References:
CA problem Pedro Gigoux

Navigate by Date:
Prev: CA problem Pedro Gigoux
Next: RE: "bad UDP msg ..." error message Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: CA problem Pedro Gigoux
Next: CA problem Allan Honey
Index: 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·