1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 <2013> 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Serious issue with JCA / CAJ |
From: | Ralph Lange <[email protected]> |
To: | EPICS Tech Talk <[email protected]> |
Date: | Tue, 03 Sep 2013 12:23:02 +0200 |
Hi, ITER is experiencing issues with JCA and CAJ, leading to a CSS application freezing/crashing or running happily, depending on a bugfix level change in EPICS base on the server side. Nadine Utzel and I were running a few tests, showing the questionable behaviour. Setup 1: Server: Gateway 2.0.4.0 using Base 3.14.12.2 Client: CSS using JCA 2.3.6 with JNI to Base 3.14.12.3 When opening panels, or switching tabs in BOY tabbed containers, there are situations where the gateway prints: CAS: Sep 03 10:17:35 !!! Errlog message received (message is above) CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1b7 bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203 Sep 03 10:17:35 !!! Errlog message received (message is above) filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479 Bad resource identifier - unexpected problem with client's input - forcing disconnect Sep 03 10:17:35 !!! Errlog message received (message is above) while the CSS console shows: 2013-09-03 08:17:35.185 WARNING [Thread 40] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=439 detected at ../../../../src/cas/generic/casStrmClient.cc.2203 file: null at line 0 2013-09-03 08:17:35.186 WARNING [Thread 58] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect Info: ca-gateway-util.codac.iter.org:5064 file: ../cac.cpp at line 1214 2013-09-03 08:17:35.442 WARNING [Thread 39] org.csstudio.utility.pv.epics.JCACommandThread (run) - JCACommandThread exception java.lang.IllegalStateException: Invalid channel at gov.aps.jca.jni.JNIChannel.assertState(JNIChannel.java:71) at gov.aps.jca.jni.JNIChannel.getConnectionState(JNIChannel.java:221) at org.csstudio.utility.pv.epics.EPICS_V3_PV$3.run(Unknown Source) at org.csstudio.utility.pv.epics.JCACommandThread.run(Unknown Source) [...] The last exception gets repeated many times, probably once per PV. Many channels reconnect, some stay disconnected and will never connect. Sometimes that last repeated exception on the client side does not occur. The bad id followed by server disconnect also causes exceptions to be printed to STDERR of CSS: 2013-09-03
08:24:22.335 WARNING [Thread 102]
org.csstudio.utility.pv.epics.ContextErrorHandler
(contextException) - Channel Access Exception from
gov.aps.jca.jni.ThreadSafeContext@179bf1b3:
Status: Bad event subscription (monitor) identifier Setup 2: Server: Gateway 2.0.4.0 using Base 3.14.12.2 Client: CSS using JCA 2.3.6 with CAJ 1.1.10 The gateway shows similar error messages, but always in pairs of two: CAS: Sep 03 10:44:50 !!! Errlog message received (message is above) bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203 Sep 03 10:44:50 !!! Errlog message received (message is above) CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1ca filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479 Bad resource identifier - unexpected problem with client's input - forcing disconnect Sep 03 10:44:50 !!! Errlog message received (message is above) CAS: Sep 03 10:44:51 !!! Errlog message received (message is above) CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=204 typ=17 cnt=1 psz=0 avail=31d bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203 Sep 03 10:44:51 !!! Errlog message received (message is above) filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479 Bad resource identifier - unexpected problem with client's input - forcing disconnect Sep 03 10:44:51 !!! Errlog message received (message is above) CSS freezes immediately and has to be killed manually (no access to CSS console). STDERR shows many times (once per PV?): 2013-09-03 08:47:01.439 SEVERE [Thread 41] com.cosylab.epics.caj.impl.CATransport (processRead) - java.lang.UnsupportedOperationException at java.nio.ByteBuffer.array(ByteBuffer.java:959) at com.cosylab.epics.caj.impl.handlers.ExceptionResponse.internalHandleResponse(ExceptionResponse.java:130) at com.cosylab.epics.caj.impl.handlers.AbstractCAResponseHandler.handleResponse(AbstractCAResponseHandler.java:110) at com.cosylab.epics.caj.impl.CAResponseHandler.handleResponse(CAResponseHandler.java:139) at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:530) at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:412) at com.cosylab.epics.caj.impl.CATransport.handleEvent(CATransport.java:350) at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.handleEvent(LeaderFollowersHandler.java:77) at com.cosylab.epics.caj.impl.reactor.Reactor.processInternal(Reactor.java:400) at com.cosylab.epics.caj.impl.reactor.Reactor.process(Reactor.java:284) at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.run(LeaderFollowersHandler.java:91) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) [...] followed by the really bad guy (when CSS freezes): 2013-09-03 08:49:15.736 WARNING [Thread 41] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from com.cosylab.epics.caj.CAJContext@15546ea6: Virtual circuit disconnect 2013-09-03 08:49:15.757 SEVERE [Thread 1] org.csstudio.utility.pv.epics.EPICS_V3_PV (handleConnected) - UTIL-S15-AG91:MUT3-JT1 connection handling error java.lang.IllegalStateException: transport closed at com.cosylab.epics.caj.impl.CATransport.submit(CATransport.java:827) at com.cosylab.epics.caj.impl.requests.AbstractCARequest.submit(AbstractCARequest.java:88) at com.cosylab.epics.caj.impl.requests.ReadNotifyRequest.submit(ReadNotifyRequest.java:171) at com.cosylab.epics.caj.CAJChannel.get(CAJChannel.java:952) at org.csstudio.utility.pv.epics.EPICS_V3_PV.handleConnected(Unknown Source) at org.csstudio.utility.pv.epics.EPICS_V3_PV.connect(Unknown Source) at org.csstudio.utility.pv.epics.EPICS_V3_PV.start(Unknown Source) at org.csstudio.opibuilder.editparts.PVWidgetEditpartDelegate.startPVs(Unknown Source) at org.csstudio.opibuilder.editparts.AbstractPVWidgetEditPart.activate(Unknown Source) at org.csstudio.opibuilder.widgets.editparts.TextUpdateEditPart.activate(Unknown Source) at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160) at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195) at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source) at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source) at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160) at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195) at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source) at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source) at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160) at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195) at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source) at org.csstudio.opibuilder.widgets.editparts.TabEditPart.activate(Unknown Source) at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160) at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195) at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source) at org.csstudio.opibuilder.editparts.DisplayEditpart.activate(Unknown Source) at org.eclipse.gef.editparts.AbstractEditPart.addChild(AbstractEditPart.java:215) at org.eclipse.gef.editparts.SimpleRootEditPart.setContents(SimpleRootEditPart.java:105) at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:617) at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:626) at org.csstudio.opibuilder.runmode.OPIRuntimeDelegate.init(Unknown Source) at org.csstudio.opibuilder.runmode.OPIRunner.init(Unknown Source) at org.csstudio.opibuilder.runmode.OPIRunner.setOPIInput(Unknown Source) at org.csstudio.opibuilder.runmode.RunModeService.replaceOPIRuntimeContent(Unknown Source) at org.csstudio.opibuilder.widgetActions.OpenDisplayAction.openOPI(Unknown Source) at org.csstudio.opibuilder.widgetActions.AbstractOpenOPIAction.run(Unknown Source) at org.csstudio.opibuilder.widgets.editparts.Draw2DButtonEditPartDelegate$1.actionPerformed(Unknown Source) at org.csstudio.swt.widgets.figures.ActionButtonFigure.fireActionPerformed(Unknown Source) at org.csstudio.swt.widgets.figures.ActionButtonFigure$ButtonEventHandler.mouseReleased(Unknown Source) at org.eclipse.draw2d.Figure.handleMouseReleased(Figure.java:944) at org.eclipse.draw2d.SWTEventDispatcher.dispatchMouseReleased(SWTEventDispatcher.java:267) at org.eclipse.gef.ui.parts.DomainEventDispatcher.dispatchMouseReleased(DomainEventDispatcher.java:374) at org.eclipse.draw2d.LightweightSystem$EventHandler.mouseUp(LightweightSystem.java:548) at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:219) at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84) at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258) at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3588) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3209) at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2701) at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2665) at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2499) at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:679) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:668) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149) at org.csstudio.utility.product.Workbench.runWorkbench(Unknown Source) at org.csstudio.startup.application.Application.startApplication(Unknown Source) at org.csstudio.startup.application.Application.start(Unknown Source) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196) [...] Now for the fun part: Setup 3: Server: Gateway 2.0.4.0 using Base 3.14.12.3 Client: CSS using JCA 2.3.6 with CAJ 1.1.10 No errors whatsoever. Bottom line: Switching the CAS CA server from base 3.14.12.2 to 3.14.12.3 determines if the pure Java CA client will die a horrible death or just work fine. Considering that Channel Access is the main separation layer that enables clients and servers of control systems to be updated independently, and that C/C++ Channel Access works reliably across virtually any combination of Base between 3.13 and 3.15, I would say this is very bad behaviour and should be considered a serious bug. I know that the client is not using the latest version of CAJ, though. Has this issue been addressed? Has anyone else seen this? I assume this is connected to LP issue 730720 [1]? The only relevant change in CAS was adding support for the DBE_PROPERTY flag. Thanks a lot, ~Ralph [1] https://bugs.launchpad.net/epics-base/+bug/730720 |