EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020 
<== Date ==> <== Thread ==>

Subject: RE: casStrmClient.cc error: invalid channel identifier
From: Emma Shepherd <Emma.Shepherd@synchrotron.org.au>
To: Tim Mooney <mooney@aps.anl.gov>, Mark Rivers <rivers@cars.uchicago.edu>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>, Emmanuel Vettoor <Emmanuel.Vettoor@synchrotron.org.au>
Date: Tue, 3 Apr 2012 09:50:44 +1000
Hi everyone,

Thanks for the replies.  At the moment I'm relying on reports from the beamline scientists so I'm not sure if it's a true 'crash', but it does look that way from the logs as other unrelated asyn messages stop being generated at that point, and from what I gather the IOC is unusable afterwards until it is rebooted.

The host IOC is on 3.14.8.2 and the IOC running saveData is on 3.14.9 at the moment.  We'll see if we can reproduce it in the lab, but it's sounding like it will be worth updating to the latest versions.

Cheers,
Emma

-----Original Message-----
From: Tim Mooney [mailto:mooney@aps.anl.gov] 
Sent: Tuesday, 3 April 2012 2:15 AM
To: Mark Rivers
Cc: Emmanuel Vettoor; Emma Shepherd; tech-talk@aps.anl.gov
Subject: Re: casStrmClient.cc error: invalid channel identifier

Hi Emma,

Mark is right that saveData is not a sequence program.  It's just straight C code.

Can you say more about the nature of the ioc crash?  Does it cease to function completely, does the saveData task get suspended, do unrelated CA tasks also have problems?

I think I can narrow the scope a little bit.  If saveData has a connection through a PV gateway, I think it must be to one of what saveData calls "extra" PVs - PVs named in saveData.req, which are read once per scan.

If so, there are only two CA calls that can have occurred: ca_search() and ca_array_get_callback().  ca_search() is only called during saveData's initialization, so this is probably a ca_array_get_callback().  saveData doesn't use a connection callback, though it does check that the chid is nonzero before calling ca_array_get_callback().

It's pretty common at APS for saveData to connect to an "extra" PV through a gateway.  I'm not sure how common it is for the ioc hosting such a PV to be rebooted, though I think I've seen it happen personally a few tens of times over the last decade or so.

Tim


----- Original Message -----
From: "Mark Rivers" <rivers@cars.uchicago.edu>
To: "Emma Shepherd" <Emma.Shepherd@synchrotron.org.au>, tech-talk@aps.anl.gov
Cc: "Emmanuel Vettoor" <Emmanuel.Vettoor@synchrotron.org.au>
Sent: Monday, April 2, 2012 7:39:06 AM
Subject: RE: casStrmClient.cc error: invalid channel identifier

I don't think saveData is actually a sequence program, but it is a channel access client.  That may be important for tracking down the problem.

Mark

________________________________________
From: tech-talk-bounces@aps.anl.gov [tech-talk-bounces@aps.anl.gov] on behalf of Emma Shepherd [Emma.Shepherd@synchrotron.org.au]
Sent: Monday, April 02, 2012 12:01 AM
To: tech-talk@aps.anl.gov
Cc: Emmanuel Vettoor
Subject: RE: casStrmClient.cc error: invalid channel identifier

Sorry, I missed a possibly important bit of information - the CA client in the IOC is actually the sscan saveData sequence program.

-----Original Message-----
From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-bounces@aps.anl.gov] On Behalf Of Emma Shepherd
Sent: Monday, 2 April 2012 11:29 AM
To: tech-talk@aps.anl.gov
Cc: Emmanuel Vettoor
Subject: casStrmClient.cc error: invalid channel identifier

Hi,

I'm trying to track down the cause of an intermittent IOC crash, and came across launchpad bug 730720 which seems to be related (although in our case it doesn't involve CAJ clients).

Our symptoms are that when an IOC is rebooted, another IOC that has CA links to PVs served by that IOC through a gateway dies.  I believe that this happens on reboot of several different IOCs, not just one in particular.

We get the following error messages in the gateway log:


<snip>
Mar 18 07:09:11 Warning: Virtual circuit disconnect SR00IOC10.accelerator.synchrotron.org.au:5064
CAS Request: ics on sr05id01ioc01: cmd=2 cid=31 typ=0 cnt=1 psz=0 avail=101
CAS:
Mar 18 07:09:19 !!! Errlog message received (message is above) bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 1810

Mar 18 07:09:19 !!! Errlog message received (message is above) filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=582 Bad resource identifier - unexpected problem with client's input - forcing disconnect

Mar 18 07:09:19 !!! Errlog message received (message is above) </snip>


And the following message in the IOC that subsequently dies:


<snip>
CA.Client.Exception...............................................
    Error: "Invalid channel identifier"
    Context: "host=10.114.10.246:5064 ctx=Bad Resource ID=31 detected at ../../../../src/cas/generic/casStrmClient.cc.1810"
    Current Time: Sun Mar 18 2012 07:09:19.697135000 </snip>

A reboot of the IOC is enough to fix it, the gateway itself doesn't need to be restarted.

We are using Gateway Version 2.0.3.0 and EPICS base 3.14.9 running on CentOS 4.4.  Has anybody been looking into this issue yet?

Thanks,
Emma


<br>This message and any attachments may contain proprietary or confidential information. If you are not the intended recipient or you received the message in error, you must not use, copy or distribute the message. Please notify the sender immediately and destroy the original message. Thank you.
<br>This message and any attachments may contain proprietary or confidential information. If you are not the intended recipient or you received the message in error, you must not use, copy or distribute the message. Please notify the sender immediately and destroy the original message. Thank you.


--
Tim Mooney (mooney@aps.anl.gov) (630)252-5417 Software Services Group (www.aps.anl.gov) Advanced Photon Source, Argonne National Lab


<br>This message and any attachments may contain proprietary or confidential information. If you are not the intended recipient or you received the message in error, you must not use, copy or distribute the message. Please notify the sender immediately and destroy the original message. Thank you.


Replies:
RE: casStrmClient.cc error: invalid channel identifier Hill, Jeffrey O
References:
RE: casStrmClient.cc error: invalid channel identifier Mark Rivers
Re: casStrmClient.cc error: invalid channel identifier Tim Mooney

Navigate by Date:
Prev: RE: asyn w/serial device pushing data Fong, Nia W.
Next: Suggestions for NMR units Angus Gratton
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020 
Navigate by Thread:
Prev: Re: casStrmClient.cc error: invalid channel identifier Tim Mooney
Next: RE: casStrmClient.cc error: invalid channel identifier Hill, Jeffrey O
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020