Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: Problems with scaler record and base 7.0.3.1
From:	Mark Rivers via Tech-talk <tech-talk at aps.anl.gov>
To:	"'Mooney, Tim M.'" <mooney at anl.gov>, "'Johnson, Andrew N.'" <anj at anl.gov>
Cc:	"core-talk at aps.anl.gov" <core-talk at aps.anl.gov>, Joanne Stubbs <stubbs at cars.uchicago.edu>, "'tech-talk at aps.anl.gov'" <tech-talk at aps.anl.gov>, Dongzhou Zhang <dzzhang at cars.uchicago.edu>, Peter Eng <eng at cars.uchicago.edu>
Date:	Sat, 1 Feb 2020 21:51:05 +0000

This is where the scaler record is using the .RATE or .RAT1 fields:


rate = ((pscal->us == USER_STATE_COUNTING) ? pscal->rate : pscal->rat1);
if (rate > .1) {
callbackRequestDelayed(pupdateCallback, 1.0/rate);
}

I suspect that in base 7.0.3.1 callbackRequestDelayed is not delaying at all on vxWorks if the delay is equal to the system clock period.  The default system clock period, and the one in use on my vxWorks IOCs, is 1/60 sec.  This would explain the observed behavior of the callbacks happening so fast when RATE=60 that they overload the system.

The release notes for 7.0.3.1 say this:

*******************************
Timers and delays use monotonic clock

Many internal timers and delay calculations use a monotonic clock epicsTimeGetMonotonic() instead of the realtime epicsTimeGetCurrent(). This is intended to make IOCs less susceptible to jumps in system time.

*******************************

So the code for delay calculations was indeed changed, and I think it broke on vxWorks.

Mark




________________________________
From: Mark Rivers
Sent: Saturday, February 1, 2020 9:04 AM
To: 'Mooney, Tim M.'; 'Johnson, Andrew N.'
Cc: Dongzhou Zhang; Joanne Stubbs; Peter Eng; 'tech-talk at aps.anl.gov'; core-talk at aps.anl.gov
Subject: RE: Problems with scaler record and base 7.0.3.1


I just rebuilt everything with 7.0.3, rather than 7.0.3.1.  That fixed the problem.  I even tested 2 scalers in the same IOC,  a Joerger and an SIS3820, both running at 60 Hz at the same time.  It worked fine.



Something is broken in 7.0.3.1.



You never find these problems until you test on a bunch of real-world IOCs :)



Mark





From: Tech-talk <tech-talk-bounces at aps.anl.gov> On Behalf Of Mark Rivers via Tech-talk
Sent: Saturday, February 1, 2020 8:40 AM
To: 'Mooney, Tim M.' <mooney at anl.gov>; 'Johnson, Andrew N.' <anj at anl.gov>
Cc: Dongzhou Zhang <dzzhang at cars.uchicago.edu>; Joanne Stubbs <stubbs at cars.uchicago.edu>; Peter Eng <eng at cars.uchicago.edu>; 'tech-talk at aps.anl.gov' <tech-talk at aps.anl.gov>
Subject: Problems with scaler record and base 7.0.3.1



Tim and Andrew,



We have discovered a serious problem with the scaler record running under the following configuration.



Base 7.0.3.1

vxWorks 6.9.4.1

std master

mca master

vme master

asyn master



The problem is the following:

-          If the scaler display update rate (.RATE field) is 59 (Hz) or less it works fine. The cbHigh task is using less than 2% of the CPU as shown by spy.

-          If .RATE=60 then the following happens:

o   The cbHigh task uses >50% of the CPU

o   The timerTask uses >20% of the CPU

o   There is 0% IDLE time in the CPU

o   The crate becomes unresponsive and loses CA connections

o   Typing 'dbpf "13LAB:scaler1.RATE","59"' fixes the problem immediately, the crate becomes responsive and CA connections are restorerd.

-          We observe this problem on both the Joerger scaler and the SIS3820 scaler.

-          The problem also happens if Autocount is enabled and .RAT1=60.



We are certain that this is a new problem, because we have autosave files from 2 vxWorks IOCs going all the way back to 2014 and .RATE was always 60 for those scalers.  This includes the last run in December 2019.



We first observed the failures yesterday (the first day of the run, naturally!)



Nothing has changed in the scaler record, or in the device support for the Joerger scaler and the SIS3820 scaler since 2018.  We have been running the master branch of these all the time, so the fact that it was working all of 2019 means the problem is unlikely to be in those modules.



The Joerger scaler does not use asyn, so the problem cannot be in asyn.



The main thing that has changed in this run is that we have updated from base 7.0.3 to 7.0.3.1.



Is there anything that could have changed in base 7.0.3.1 that might cause this behavior?



We can work around the problem for now by setting .RATE less than 60, but others are likely to be hit by the same problem.



Thanks,

Mark

References:: Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk; RE: Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk

Navigate by Date:: Prev: Re: ezca for 3.15 Mark Rivers via Tech-talk; Next: Re: ezca for 3.15 Siddons, David via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: Problems with scaler record and base 7.0.3.1 Mark Rivers via Tech-talk; Next: ezca for 3.15 Siddons, David via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 2025

ANJ, 01 Feb 2020

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System