EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Problems with scaler record and base 7.0.3.1
From: Mark Rivers via Tech-talk <tech-talk at aps.anl.gov>
To: "'Mooney, Tim M.'" <mooney at anl.gov>, "'Johnson, Andrew N.'" <anj at anl.gov>
Cc: "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>, Joanne Stubbs <stubbs at cars.uchicago.edu>, "'tech-talk at aps.anl.gov'" <tech-talk at aps.anl.gov>, Dongzhou Zhang <dzzhang at cars.uchicago.edu>, Peter Eng <eng at cars.uchicago.edu>
Date: Sat, 1 Feb 2020 21:51:05 +0000
This is where the scaler record is using the .RATE or .RAT1 fields:


rate = ((pscal->us == USER_STATE_COUNTING) ? pscal->rate : pscal->rat1);
if (rate > .1) {
callbackRequestDelayed(pupdateCallback, 1.0/rate);
}

I suspect that in base 7.0.3.1 callbackRequestDelayed is not delaying at all on vxWorks if the delay is equal to the system clock period.  The default system clock period, and the one in use on my vxWorks IOCs, is 1/60 sec.  This would explain the observed behavior of the callbacks happening so fast when RATE=60 that they overload the system.

The release notes for 7.0.3.1 say this:

*******************************
Timers and delays use monotonic clock

Many internal timers and delay calculations use a monotonic clock epicsTimeGetMonotonic() instead of the realtime epicsTimeGetCurrent(). This is intended to make IOCs less susceptible to jumps in system time.

*******************************

So the code for delay calculations was indeed changed, and I think it broke on vxWorks.

Mark




________________________________
From: Mark Rivers
Sent: Saturday, February 1, 2020 9:04 AM
To: 'Mooney, Tim M.'; 'Johnson, Andrew N.'
Cc: Dongzhou Zhang; Joanne Stubbs; Peter Eng; 'tech-talk at aps.anl.gov'; core-talk at aps.anl.gov
Subject: RE: Problems with scaler record and base 7.0.3.1


I just rebuilt everything with 7.0.3, rather than 7.0.3.1.  That fixed the problem.  I even tested 2 scalers in the same IOC,  a Joerger and an SIS3820, both running at 60 Hz at the same time.  It worked fine.



Something is broken in 7.0.3.1.



You never find these problems until you test on a bunch of real-world IOCs :)



Mark





From: Tech-talk <tech-talk-bounces at aps.anl.gov> On Behalf Of Mark Rivers via Tech-talk
Sent: Saturday, February 1, 2020 8:40 AM
To: 'Mooney, Tim M.' <mooney at anl.gov>; 'Johnson, Andrew N.' <anj at anl.gov>
Cc: Dongzhou Zhang <dzzhang at cars.uchicago.edu>; Joanne Stubbs <stubbs at cars.uchicago.edu>; Peter Eng <eng at cars.uchicago.edu>; 'tech-talk at aps.anl.gov' <tech-talk at aps.anl.gov>
Subject: Problems with scaler record and base 7.0.3.1



Tim and Andrew,



We have discovered a serious problem with the scaler record running under the following configuration.



Base 7.0.3.1

vxWorks 6.9.4.1

std master

mca master

vme master

asyn master



The problem is the following:

-          If the scaler display update rate (.RATE field) is 59 (Hz) or less it works fine. The cbHigh task is using less than 2% of the CPU as shown by spy.

-          If .RATE=60 then the following happens:

o   The cbHigh task uses >50% of the CPU

o   The timerTask uses >20% of the CPU

o   There is 0% IDLE time in the CPU

o   The crate becomes unresponsive and loses CA connections

o   Typing 'dbpf "13LAB:scaler1.RATE","59"' fixes the problem immediately, the crate becomes responsive and CA connections are restorerd.

-          We observe this problem on both the Joerger scaler and the SIS3820 scaler.

-          The problem also happens if Autocount is enabled and .RAT1=60.



We are certain that this is a new problem, because we have autosave files from 2 vxWorks IOCs going all the way back to 2014 and .RATE was always 60 for those scalers.  This includes the last run in December 2019.



We first observed the failures yesterday (the first day of the run, naturally!)



Nothing has changed in the scaler record, or in the device support for the Joerger scaler and the SIS3820 scaler since 2018.  We have been running the master branch of these all the time, so the fact that it was working all of 2019 means the problem is unlikely to be in those modules.



The Joerger scaler does not use asyn, so the problem cannot be in asyn.



The main thing that has changed in this run is that we have updated from base 7.0.3 to 7.0.3.1.



Is there anything that could have changed in base 7.0.3.1 that might cause this behavior?



We can work around the problem for now by setting .RATE less than 60, but others are likely to be hit by the same problem.



Thanks,

Mark



References:
Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk
RE: Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk

Navigate by Date:
Prev: Re: ezca for 3.15 Mark Rivers via Tech-talk
Next: Re: ezca for 3.15 Siddons, David via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk
Next: ezca for 3.15 Siddons, David via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 01 Feb 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·