I am having another problem with JCA/CAJ that I have been meaning to ask about.
The problem shows up when using the areaDetector ImageJ plugin for viewing images. The problem only occurs on Windows 7 64-bit platforms. The problem is that the image update temporarily freezes for several
seconds, and then starts updating OK again.
I think the problem is with JCA/CAJ rather than with my ImageJ client code, for 2 reasons:
1) The problem only seems to occur on Windows 7 64-bit. It does not occur on Windows XP, Windows 7 32-bit, or Linux.
2) The problem used to be significantly worse. Previously when the plugin stopped updating the only way to recover was to disconnect and reconnect the PVs, by hitting return in the widget in which one enters
the base PV name. The problem got much better in areaDetector 1-9, when I upgraded CAJ from 1.1.5 to 1.1.10, and JCA from 2.3.2 to 2.3.6. The fact that the problem got better, but not completely fixed, with a CAJ/JCA update leads me to think that is still
where the problem lies.
I have not debugged the problem to find if the problem is that the plugin is not getting monitor updates on the UniqueID PV, or if the get() operation on the waveform record is timing out, etc.
The problem is easy to reproduce by running the simDetector on any platform and running the ImageJ plugin on a Windows 64 machine.
This is a serious problem because many detectors only run on Windows, and Windows 7 64-bit is becoming the norm.
I am hoping someone else who is much more knowledgeable about Java can help to debug this.
Thanks,
Mark
From: [email protected] [mailto:[email protected]]
On Behalf Of Ralph Lange
Sent: Tuesday, September 03, 2013 5:23 AM
To: EPICS Tech Talk
Subject: Serious issue with JCA / CAJ
Hi,
ITER is experiencing issues with JCA and CAJ, leading to a CSS application freezing/crashing or running happily, depending on a bugfix level change in EPICS base on the server side.
Nadine Utzel and I were running a few tests, showing the questionable behaviour.
Setup 1:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with JNI to Base 3.14.12.3
When opening panels, or switching tabs in BOY tabbed containers, there are situations where the gateway prints:
CAS:
Sep 03 10:17:35 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1b7
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203
Sep 03 10:17:35 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect
Sep 03 10:17:35 !!! Errlog message received (message is above)
while the CSS console shows:
2013-09-03 08:17:35.185 WARNING [Thread 40] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from
gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier
Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=439 detected at ../../../../src/cas/generic/casStrmClient.cc.2203
file: null at line 0
2013-09-03 08:17:35.186 WARNING [Thread 58] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from
gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect
Info: ca-gateway-util.codac.iter.org:5064
file: ../cac.cpp at line 1214
2013-09-03 08:17:35.442 WARNING [Thread 39] org.csstudio.utility.pv.epics.JCACommandThread (run) - JCACommandThread exception
java.lang.IllegalStateException: Invalid channel
at gov.aps.jca.jni.JNIChannel.assertState(JNIChannel.java:71)
at gov.aps.jca.jni.JNIChannel.getConnectionState(JNIChannel.java:221)
at org.csstudio.utility.pv.epics.EPICS_V3_PV$3.run(Unknown Source)
at org.csstudio.utility.pv.epics.JCACommandThread.run(Unknown Source)
[...]
The last exception gets repeated many times, probably once per PV.
Many channels reconnect, some stay disconnected and will never connect.
Sometimes that last repeated exception on the client side does not occur.
The bad id followed by server disconnect also causes exceptions to be printed to STDERR of CSS:
2013-09-03 08:24:22.335 WARNING [Thread 102] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access
Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier
Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=6334 detected at ../../../../src/cas/generic/casStrmClient.cc.2203
file: null at line 0
2013-09-03 08:24:22.336 WARNING [Thread 111] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from
gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect
Info: ca-gateway-util.codac.iter.org:5064
file: ../cac.cpp at line 1214
Setup 2:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with CAJ 1.1.10
The gateway shows similar error messages, but always in pairs of two:
CAS:
Sep 03 10:44:50 !!! Errlog message received (message is above)
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203
Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1ca
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect
Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS:
Sep 03 10:44:51 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=204 typ=17 cnt=1 psz=0 avail=31d
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203
Sep 03 10:44:51 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect
Sep 03 10:44:51 !!! Errlog message received (message is above)
CSS freezes immediately and has to be killed manually (no access to CSS console). STDERR shows many times (once per PV?):
2013-09-03 08:47:01.439 SEVERE [Thread 41] com.cosylab.epics.caj.impl.CATransport (processRead) -
java.lang.UnsupportedOperationException
at java.nio.ByteBuffer.array(ByteBuffer.java:959)
at com.cosylab.epics.caj.impl.handlers.ExceptionResponse.internalHandleResponse(ExceptionResponse.java:130)
at com.cosylab.epics.caj.impl.handlers.AbstractCAResponseHandler.handleResponse(AbstractCAResponseHandler.java:110)
at com.cosylab.epics.caj.impl.CAResponseHandler.handleResponse(CAResponseHandler.java:139)
at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:530)
at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:412)
at com.cosylab.epics.caj.impl.CATransport.handleEvent(CATransport.java:350)
at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.handleEvent(LeaderFollowersHandler.java:77)
at com.cosylab.epics.caj.impl.reactor.Reactor.processInternal(Reactor.java:400)
at com.cosylab.epics.caj.impl.reactor.Reactor.process(Reactor.java:284)
at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.run(LeaderFollowersHandler.java:91)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
[...]
followed by the really bad guy (when CSS freezes):
2013-09-03 08:49:15.736 WARNING [Thread 41] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from
com.cosylab.epics.caj.CAJContext@15546ea6: Virtual circuit disconnect
2013-09-03 08:49:15.757 SEVERE [Thread 1] org.csstudio.utility.pv.epics.EPICS_V3_PV (handleConnected) - UTIL-S15-AG91:MUT3-JT1 connection handling error
java.lang.IllegalStateException: transport closed
at com.cosylab.epics.caj.impl.CATransport.submit(CATransport.java:827)
at com.cosylab.epics.caj.impl.requests.AbstractCARequest.submit(AbstractCARequest.java:88)
at com.cosylab.epics.caj.impl.requests.ReadNotifyRequest.submit(ReadNotifyRequest.java:171)
at com.cosylab.epics.caj.CAJChannel.get(CAJChannel.java:952)
at org.csstudio.utility.pv.epics.EPICS_V3_PV.handleConnected(Unknown Source)
at org.csstudio.utility.pv.epics.EPICS_V3_PV.connect(Unknown Source)
at org.csstudio.utility.pv.epics.EPICS_V3_PV.start(Unknown Source)
at org.csstudio.opibuilder.editparts.PVWidgetEditpartDelegate.startPVs(Unknown Source)
at org.csstudio.opibuilder.editparts.AbstractPVWidgetEditPart.activate(Unknown Source)
at org.csstudio.opibuilder.widgets.editparts.TextUpdateEditPart.activate(Unknown Source)
at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)
at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)
at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
at org.csstudio.opibuilder.widgets.editparts.TabEditPart.activate(Unknown Source)
at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
at org.csstudio.opibuilder.editparts.DisplayEditpart.activate(Unknown Source)
at org.eclipse.gef.editparts.AbstractEditPart.addChild(AbstractEditPart.java:215)
at org.eclipse.gef.editparts.SimpleRootEditPart.setContents(SimpleRootEditPart.java:105)
at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:617)
at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:626)
at org.csstudio.opibuilder.runmode.OPIRuntimeDelegate.init(Unknown Source)
at org.csstudio.opibuilder.runmode.OPIRunner.init(Unknown Source)
at org.csstudio.opibuilder.runmode.OPIRunner.setOPIInput(Unknown Source)
at org.csstudio.opibuilder.runmode.RunModeService.replaceOPIRuntimeContent(Unknown Source)
at org.csstudio.opibuilder.widgetActions.OpenDisplayAction.openOPI(Unknown Source)
at org.csstudio.opibuilder.widgetActions.AbstractOpenOPIAction.run(Unknown Source)
at org.csstudio.opibuilder.widgets.editparts.Draw2DButtonEditPartDelegate$1.actionPerformed(Unknown Source)
at org.csstudio.swt.widgets.figures.ActionButtonFigure.fireActionPerformed(Unknown Source)
at org.csstudio.swt.widgets.figures.ActionButtonFigure$ButtonEventHandler.mouseReleased(Unknown Source)
at org.eclipse.draw2d.Figure.handleMouseReleased(Figure.java:944)
at org.eclipse.draw2d.SWTEventDispatcher.dispatchMouseReleased(SWTEventDispatcher.java:267)
at org.eclipse.gef.ui.parts.DomainEventDispatcher.dispatchMouseReleased(DomainEventDispatcher.java:374)
at org.eclipse.draw2d.LightweightSystem$EventHandler.mouseUp(LightweightSystem.java:548)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:219)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3588)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3209)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2701)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2665)
at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2499)
at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:679)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:668)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
at org.csstudio.utility.product.Workbench.runWorkbench(Unknown Source)
at org.csstudio.startup.application.Application.startApplication(Unknown Source)
at org.csstudio.startup.application.Application.start(Unknown Source)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
[...]
Now for the fun part:
Setup 3:
Server: Gateway 2.0.4.0 using Base 3.14.12.3
Client: CSS using JCA 2.3.6 with CAJ 1.1.10
No errors whatsoever.
Bottom line:
Switching the CAS CA server from base 3.14.12.2 to 3.14.12.3 determines if the pure Java CA client will die a horrible death or just work fine.
Considering that Channel Access is the main separation layer that enables clients and servers of control systems to be updated independently, and that C/C++ Channel Access works reliably across virtually any combination of Base between 3.13 and 3.15, I would
say this is very bad behaviour and should be considered a serious bug.
I know that the client is not using the latest version of CAJ, though. Has this issue been addressed?
Has anyone else seen this? I assume this is connected to LP issue 730720 [1]?
The only relevant change in CAS was adding support for the DBE_PROPERTY flag.
Thanks a lot,
~Ralph
[1] https://bugs.launchpad.net/epics-base/+bug/730720