EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Serious issue with JCA / CAJ
From: Mark Rivers <[email protected]>
To: "'Ralph Lange'" <[email protected]>, EPICS Tech Talk <[email protected]>
Cc: Peter Eng <[email protected]>
Date: Tue, 3 Sep 2013 14:32:53 +0000

I am having another problem with JCA/CAJ that I have been meaning to ask about.

 

The problem shows up when using the areaDetector ImageJ plugin for viewing images.  The problem only occurs on Windows 7 64-bit platforms.  The problem is that the image update temporarily freezes for several seconds, and then starts updating OK again.

 

I think the problem is with JCA/CAJ rather than with my ImageJ client code, for 2 reasons:

 

1) The problem only seems to occur on Windows 7 64-bit.  It does not occur on Windows XP, Windows 7 32-bit, or Linux.

 

2) The problem used to be significantly worse.  Previously when the plugin stopped updating the only way to recover was to disconnect and reconnect the PVs, by hitting return in the widget in which one enters the base PV name.  The problem got much better in areaDetector 1-9, when I  upgraded CAJ from 1.1.5 to 1.1.10, and JCA from 2.3.2 to 2.3.6.  The fact that the problem got better, but not completely fixed, with a CAJ/JCA update leads me to think that is still where the problem lies.

 

I have not debugged the problem to find if the problem is that the plugin is not getting monitor updates on the UniqueID PV, or if the get() operation on the waveform record is timing out, etc.

 

The problem is easy to reproduce by running the simDetector on any platform and running the ImageJ plugin on a Windows 64 machine.

 

This is a serious problem because many detectors only run on Windows, and Windows 7 64-bit is becoming the norm.

 

I am hoping someone else who is much more knowledgeable about Java can help to debug this.

 

Thanks,

Mark

 

 

 

From: [email protected] [mailto:[email protected]] On Behalf Of Ralph Lange
Sent: Tuesday, September 03, 2013 5:23 AM
To: EPICS Tech Talk
Subject: Serious issue with JCA / CAJ

 

Hi,

ITER is experiencing issues with JCA and CAJ, leading to a CSS application freezing/crashing or running happily, depending on a bugfix level change in EPICS base on the server side.

Nadine Utzel and I were running a few tests, showing the questionable behaviour.


Setup 1:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with JNI to Base 3.14.12.3

When opening panels, or switching tabs in BOY tabbed containers, there are situations where the gateway prints:

CAS:
Sep 03 10:17:35 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1b7
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:17:35 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:17:35 !!! Errlog message received (message is above)

while the CSS console shows:

2013-09-03 08:17:35.185 WARNING [Thread 40] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier
Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=439 detected at ../../../../src/cas/generic/casStrmClient.cc.2203
file: null at line 0
2013-09-03 08:17:35.186 WARNING [Thread 58] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect
Info: ca-gateway-util.codac.iter.org:5064
file: ../cac.cpp at line 1214
2013-09-03 08:17:35.442 WARNING [Thread 39] org.csstudio.utility.pv.epics.JCACommandThread (run) - JCACommandThread exception
java.lang.IllegalStateException: Invalid channel
                at gov.aps.jca.jni.JNIChannel.assertState(JNIChannel.java:71)
                at gov.aps.jca.jni.JNIChannel.getConnectionState(JNIChannel.java:221)
                at org.csstudio.utility.pv.epics.EPICS_V3_PV$3.run(Unknown Source)
                at org.csstudio.utility.pv.epics.JCACommandThread.run(Unknown Source)
[...]


The last exception gets repeated many times, probably once per PV.
Many channels reconnect, some stay disconnected and will never connect.

Sometimes that last repeated exception on the client side does not occur.

The bad id followed by server disconnect also causes exceptions to be printed to STDERR of CSS:

2013-09-03 08:24:22.335 WARNING [Thread 102] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Bad event subscription (monitor) identifier
Info: host=ca-gateway-util.codac.iter.org:5064 ctx=Bad Resource ID=6334 detected at ../../../../src/cas/generic/casStrmClient.cc.2203
file: null at line 0
2013-09-03 08:24:22.336 WARNING [Thread 111] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from gov.aps.jca.jni.ThreadSafeContext@179bf1b3: Status: Virtual circuit disconnect
Info: ca-gateway-util.codac.iter.org:5064
file: ../cac.cpp at line 1214


Setup 2:
Server: Gateway 2.0.4.0 using Base 3.14.12.2
Client: CSS using JCA 2.3.6 with CAJ 1.1.10

The gateway shows similar error messages, but always in pairs of two:

CAS:
Sep 03 10:44:50 !!! Errlog message received (message is above)
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=135 typ=34 cnt=1 psz=0 avail=1ca
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:44:50 !!! Errlog message received (message is above)
CAS:
Sep 03 10:44:51 !!! Errlog message received (message is above)
CAS Request: utzeln on 4501WS-CC-0006.codac.iter.org: cmd=2 cid=204 typ=17 cnt=1 psz=0 avail=31d
bad resource id in "../../../../src/cas/generic/casStrmClient.cc" at line 2203

Sep 03 10:44:51 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/st/casStreamOS.cc" line number=479
Bad resource identifier - unexpected problem with client's input - forcing disconnect

Sep 03 10:44:51 !!! Errlog message received (message is above)

CSS freezes immediately and has to be killed manually (no access to CSS console). STDERR shows many times (once per PV?):

2013-09-03 08:47:01.439 SEVERE [Thread 41] com.cosylab.epics.caj.impl.CATransport (processRead) -
java.lang.UnsupportedOperationException
                at java.nio.ByteBuffer.array(ByteBuffer.java:959)
                at com.cosylab.epics.caj.impl.handlers.ExceptionResponse.internalHandleResponse(ExceptionResponse.java:130)
                at com.cosylab.epics.caj.impl.handlers.AbstractCAResponseHandler.handleResponse(AbstractCAResponseHandler.java:110)
                at com.cosylab.epics.caj.impl.CAResponseHandler.handleResponse(CAResponseHandler.java:139)
                at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:530)
                at com.cosylab.epics.caj.impl.CATransport.processRead(CATransport.java:412)
                at com.cosylab.epics.caj.impl.CATransport.handleEvent(CATransport.java:350)
                at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.handleEvent(LeaderFollowersHandler.java:77)
                at com.cosylab.epics.caj.impl.reactor.Reactor.processInternal(Reactor.java:400)
                at com.cosylab.epics.caj.impl.reactor.Reactor.process(Reactor.java:284)
                at com.cosylab.epics.caj.impl.reactor.lf.LeaderFollowersHandler.run(LeaderFollowersHandler.java:91)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
                at java.lang.Thread.run(Thread.java:722)
[...]

followed by the really bad guy (when CSS freezes):

2013-09-03 08:49:15.736 WARNING [Thread 41] org.csstudio.utility.pv.epics.ContextErrorHandler (contextException) - Channel Access Exception from com.cosylab.epics.caj.CAJContext@15546ea6: Virtual circuit disconnect
2013-09-03 08:49:15.757 SEVERE [Thread 1] org.csstudio.utility.pv.epics.EPICS_V3_PV (handleConnected) - UTIL-S15-AG91:MUT3-JT1 connection handling error
java.lang.IllegalStateException: transport closed
                at com.cosylab.epics.caj.impl.CATransport.submit(CATransport.java:827)
                at com.cosylab.epics.caj.impl.requests.AbstractCARequest.submit(AbstractCARequest.java:88)
                at com.cosylab.epics.caj.impl.requests.ReadNotifyRequest.submit(ReadNotifyRequest.java:171)
                at com.cosylab.epics.caj.CAJChannel.get(CAJChannel.java:952)
                at org.csstudio.utility.pv.epics.EPICS_V3_PV.handleConnected(Unknown Source)
                at org.csstudio.utility.pv.epics.EPICS_V3_PV.connect(Unknown Source)
                at org.csstudio.utility.pv.epics.EPICS_V3_PV.start(Unknown Source)
                at org.csstudio.opibuilder.editparts.PVWidgetEditpartDelegate.startPVs(Unknown Source)
                at org.csstudio.opibuilder.editparts.AbstractPVWidgetEditPart.activate(Unknown Source)
                at org.csstudio.opibuilder.widgets.editparts.TextUpdateEditPart.activate(Unknown Source)
                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
                at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)
                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
                at org.csstudio.opibuilder.widgets.editparts.GroupingContainerEditPart.activate(Unknown Source)
                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
                at org.csstudio.opibuilder.widgets.editparts.TabEditPart.activate(Unknown Source)
                at org.eclipse.gef.editparts.AbstractEditPart.activate(AbstractEditPart.java:160)
                at org.eclipse.gef.editparts.AbstractGraphicalEditPart.activate(AbstractGraphicalEditPart.java:195)
                at org.csstudio.opibuilder.editparts.AbstractBaseEditPart.activate(Unknown Source)
                at org.csstudio.opibuilder.editparts.DisplayEditpart.activate(Unknown Source)
                at org.eclipse.gef.editparts.AbstractEditPart.addChild(AbstractEditPart.java:215)
                at org.eclipse.gef.editparts.SimpleRootEditPart.setContents(SimpleRootEditPart.java:105)
                at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:617)
                at org.eclipse.gef.ui.parts.AbstractEditPartViewer.setContents(AbstractEditPartViewer.java:626)
                at org.csstudio.opibuilder.runmode.OPIRuntimeDelegate.init(Unknown Source)
                at org.csstudio.opibuilder.runmode.OPIRunner.init(Unknown Source)
                at org.csstudio.opibuilder.runmode.OPIRunner.setOPIInput(Unknown Source)
                at org.csstudio.opibuilder.runmode.RunModeService.replaceOPIRuntimeContent(Unknown Source)
                at org.csstudio.opibuilder.widgetActions.OpenDisplayAction.openOPI(Unknown Source)
                at org.csstudio.opibuilder.widgetActions.AbstractOpenOPIAction.run(Unknown Source)
                at org.csstudio.opibuilder.widgets.editparts.Draw2DButtonEditPartDelegate$1.actionPerformed(Unknown Source)
                at org.csstudio.swt.widgets.figures.ActionButtonFigure.fireActionPerformed(Unknown Source)
                at org.csstudio.swt.widgets.figures.ActionButtonFigure$ButtonEventHandler.mouseReleased(Unknown Source)
                at org.eclipse.draw2d.Figure.handleMouseReleased(Figure.java:944)
                at org.eclipse.draw2d.SWTEventDispatcher.dispatchMouseReleased(SWTEventDispatcher.java:267)
                at org.eclipse.gef.ui.parts.DomainEventDispatcher.dispatchMouseReleased(DomainEventDispatcher.java:374)
                at org.eclipse.draw2d.LightweightSystem$EventHandler.mouseUp(LightweightSystem.java:548)
                at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:219)
                at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
                at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
                at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3588)
                at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3209)
                at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2701)
                at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2665)
                at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2499)
                at org.eclipse.ui.internal.Workbench$7.run(Workbench.java:679)
                at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
                at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:668)
                at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
                at org.csstudio.utility.product.Workbench.runWorkbench(Unknown Source)
                at org.csstudio.startup.application.Application.startApplication(Unknown Source)
                at org.csstudio.startup.application.Application.start(Unknown Source)
                at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
                [...]


Now for the fun part:

Setup 3:
Server: Gateway 2.0.4.0 using Base 3.14.12.3
Client: CSS using JCA 2.3.6 with CAJ 1.1.10

No errors whatsoever.


Bottom line:
Switching the CAS CA server from base 3.14.12.2 to 3.14.12.3 determines if the pure Java CA client will die a horrible death or just work fine.

Considering that Channel Access is the main separation layer that enables clients and servers of control systems to be updated independently, and that C/C++ Channel Access works reliably across virtually any combination of Base between 3.13 and 3.15, I would say this is very bad behaviour and should be considered a serious bug.

I know that the client is not using the latest version of CAJ, though. Has this issue been addressed?

Has anyone else seen this? I assume this is connected to LP issue 730720 [1]?
The only relevant change in CAS was adding support for the DBE_PROPERTY flag.

Thanks a lot,
~Ralph

[1] https://bugs.launchpad.net/epics-base/+bug/730720


References:
Serious issue with JCA / CAJ Ralph Lange

Navigate by Date:
Prev: Serious issue with JCA / CAJ Ralph Lange
Next: Re: Serious issue with JCA / CAJ Michael Davidsaver
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Serious issue with JCA / CAJ Ralph Lange
Next: Re: Serious issue with JCA / CAJ Michael Davidsaver
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·