I just rebuilt everything with 7.0.3, rather than 184.108.40.206. That fixed the problem. I even tested 2 scalers in the same IOC, a Joerger and an SIS3820, both running at
60 Hz at the same time. It worked fine.
Something is broken in 220.127.116.11.
You never find these problems until you test on a bunch of real-world IOCs
From: Tech-talk <tech-talk-bounces at aps.anl.gov> On Behalf Of
Mark Rivers via Tech-talk
Sent: Saturday, February 1, 2020 8:40 AM
To: 'Mooney, Tim M.' <mooney at anl.gov>; 'Johnson, Andrew N.' <anj at anl.gov>
Cc: Dongzhou Zhang <dzzhang at cars.uchicago.edu>; Joanne Stubbs <stubbs at cars.uchicago.edu>; Peter Eng <eng at cars.uchicago.edu>; 'tech-talk at aps.anl.gov' <tech-talk at aps.anl.gov>
Subject: Problems with scaler record and base 18.104.22.168
Tim and Andrew,
We have discovered a serious problem with the scaler record running under the following configuration.
The problem is the following:
If the scaler display update rate (.RATE field) is 59 (Hz) or less it works fine. The cbHigh task is using less than 2% of the CPU as shown by spy.
If .RATE=60 then the following happens:
The cbHigh task uses >50% of the CPU
The timerTask uses >20% of the CPU
There is 0% IDLE time in the CPU
The crate becomes unresponsive and loses CA connections
Typing ‘dbpf “13LAB:scaler1.RATE”,”59”’ fixes the problem immediately, the crate becomes responsive and CA connections are restorerd.
We observe this problem on both the Joerger scaler and the SIS3820 scaler.
The problem also happens if Autocount is enabled and .RAT1=60.
We are certain that this is a new problem, because we have autosave files from 2 vxWorks IOCs going all the way back to 2014 and .RATE was always 60 for those scalers. This includes the last run in December 2019.
We first observed the failures yesterday (the first day of the run, naturally!)
Nothing has changed in the scaler record, or in the device support for the Joerger scaler and the SIS3820 scaler since 2018. We have been running the master branch of these all the time, so the fact that it was working all of 2019 means
the problem is unlikely to be in those modules.
The Joerger scaler does not use asyn, so the problem cannot be in asyn.
The main thing that has changed in this run is that we have updated from base 7.0.3 to 22.214.171.124.
Is there anything that could have changed in base 126.96.36.199 that might cause this behavior?
We can work around the problem for now by setting .RATE less than 60, but others are likely to be hit by the same problem.