EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Problem with huge waveforms in EPICS 7
From: "Zimoch Dirk \(PSI\) via Core-talk" <core-talk at aps.anl.gov>
To: "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>
Date: Mon, 27 Jun 2022 08:09:54 +0000
Hi everyone,

Thanks Mark for testing it with AD. The linear ramp is probably similar to my "sequence" test device support.
The client I used is caqtdm.

As Mark already noticed, the problem does not appear with a client running on the same host. I guess this is the case
because the host shortcuts the TCP traffic to itself and thus the bandwidth is much higher than what we can achieve
between two different hosts.

Running the IOC with RT priorities does not change anything.

I had not counted the monitors, thus I did not notice that the IOC sends exactly n+1 frames. The behavior is consistent
with putting a pointer to the array data (at an unchanging location) into a queue. If the data changes faster than it
can be sent, then the next element in the queue will be a pointer not to the next frame but instead will miss some.
After data production ends, all remaining pointers in the queue will point to the non-changing last frame, until some
other client makes the array produce new data. At that point the original client will receive those updates while
working through the backlog. Of course this behavior makes no sense at all. Either put the whole data into the queue or
do not queue the pointer at all.

Using PVA, I do not see this effect. Still the network cannot keep up with the rate the data is produced, thus frames
are lost. But pressing the STOP button takes effect almost immediately. (Maybe after one or two more updates, but that
can be latency in the client or network.) Is there a debug variable like CASDEBUG for PVA that would allow to see what
PVA is sending?

I can try to find out which commit changed the behavior in CA ...

Dirk


On Sat, 2022-06-25 at 09:03 +0000, Zimoch Dirk (PSI) via Core-talk wrote:
> Hi Andrew,
> 
> The cameras run on Windows. I did my test on Linux but not as root, thus I had no RT scheduling. I will repeat the
> test on Monday running as root.
> 
> The STOP message gets processed in time! A client that does not monitor the array sees the change immediately. The
> counter stops. But CA keeps sending!
> 
> I had expected that the IOC would drop frames if CA cannot send fast enough. Not trying for minutes to work through a
> pile of unsent frames. And then not even sending updates but simply repeating the last frame.
> 
> Dirk
> 
> > Am 24.06.2022 um 18:10 schrieb Andrew Johnson via Core-talk <core-talk at aps.anl.gov>:
> > 
> >  Hi Dirk,
> > 
> > What OS is the IOC running on — I'm guessing Linux but you didn't say. If so is it built for and using priority
> > thread scheduling? If the OSSPRI field from epicsThreadShowAll is all zeros it isn't, and enabling that might help.
> > The normal Linux scheduler tends to maximize throughput, not fairness, so it could be delaying the threads which
> > process your STOP message while the threads handling image data can continue to make progress. However this is just
> > a guess.
> > 
> > - Andrew
> > 
> > 
> > General musings: The setpriority(2) manpage on RHEL-7 says:
> > > BUGS
> > >        According  to  POSIX, the nice value is a per-process setting.  However, under
> > >        the current Linux/NPTL implementation of POSIX threads, the nice  value  is  a
> > >        per-thread attribute: different threads in the same process can have different
> > >        nice values.  Portable applications should avoid relying on the  Linux  behav‐
> > >        ior, which may be made standards conformant in the future.
> >  
> > I wonder whether we should look at setting nice values for Linux threads when the process doesn't have the ability
> > to use SCHED_FIFO?
> > 
> > 
> > 
> > On 6/24/22 10:39 AM, Zimoch Dirk (PSI) via Core-talk wrote:
> > > Hi folks,
> > > 
> > > Some of or users complained that a camera server became less responsive since it had been upgraded from EPICS
> > > 3.14.12.6
> > > to 7.0.6.1.
> > > 
> > > The camera sends image data as arrays of 20000000 SHORTs (5000x4000 pixels). When the user presses the "STOP"
> > > button on
> > > the client which displays the image, it takes a long time to stop. The more active clients, the longer it takes.
> > > But even sending stop from a different client (e.g. command line caput) takes a long time before the GUI clients
> > > update.
> > > 
> > > I have set up a simple simulation and run it with 'var CADEBUG 3'
> > > Here is what I see: on EPICS 7.0.6.1
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: TCP Request from 129.129.130.117:47142 => cmmd=4 (CA_PROTO_WRITE) cid=0x4 type=0 count=1 postsize=8
> > > version=13
> > > CAS: Request from 129.129.130.117:47142 =>   available=0x2 	N=1 paddr=0x7efcb800db80
> > > CAS: Request from 129.129.130.117:47142 =>   Wrote string "STOP"
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > [>80 times the same!]
> > > CAS: Sending a message of 40000056 bytes <---- I think this one contains the update of the STOP button
> > > CAS: Sending a message of 40000032 bytes
> > > [eventually stops many seconds later]
> > > 
> > > The IOC obviously gets the STOP message immediately when I press the button on the client. But the client (and any
> > > other
> > > client showing the image) does not see the button change. The GUI appears "frozen". But a command line camonitor
> > > monitoring the stop button (and a counter that counts the number of created images but not the image itself) show
> > > that
> > > the records stop immediately.
> > > Nevertheless the IOC keeps sending images. But the images do not change any more on the clients. So it seems that
> > > the
> > > IOC keeps sending the same array data over and over again.
> > > 
> > > On 3.14.12, the output looks similar, but the "send after stop" consists of only a few messages:
> > > CAS: Request from 129.129.130.117:47184 => cmmd=4 cid=0x1 type=0 count=1 postsize=8
> > > CAS: Request from 129.129.130.117:47184 =>   available=0x2 	N=1 paddr=0x7f0768010b28
> > > CAS: Request from 129.129.130.117:47184 =>   Wrote string "STOP"
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000032 bytes
> > > CAS: Sending a message of 40000056 bytes <---- update of the STOP button
> > > 
> > > What can be wrong here? 
> > > The IOC consists of a counting calc, a bo for the stop switch and a waveform record with a driver that simply
> > > fills the
> > > waveform with a sequence starting at the counter value. Nothing fancy.
> > > 
> > > Here is my db:
> > > 
> > > record (waveform, "DZ:BIGARRAY")
> > > {
> > >     field(FTVL, "SHORT")
> > >     field(NELM, "20000000")
> > >     field(DTYP, "sequence")
> > >     field(SCAN, ".1 second")
> > >     field(SDIS, "DZ:STOP")
> > >     field(INP,  "DZ:COUNT")
> > >     field(FLNK, "DZ:COUNT")
> > > }
> > > 
> > > record (calc, "DZ:COUNT")
> > > {
> > >     field(CALC, "VAL+1")
> > > }
> > > 
> > > record(bo, "DZ:STOP")
> > > {
> > >     field(ZNAM,"GO")
> > >     field(ONAM,"STOP")
> > > }
> > > 
> > > I suspect this happens when record produces new waveforms faster than they can be sent.
> > > The IOC has no problem processing the waveform at 10 Hz, but I see only about 3 CAS messages per second.
> > > I had to slow down the waveform processing to ".5 second" to improves responsiveness. That is when the monitor
> > > updates
> > > can be sent as quickly as being produced. But opening a second client again spoils everything.
> > > 
> > > Dirk
> > > 
> > > 
> >  

Replies:
Re: Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk
References:
Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk
Re: Problem with huge waveforms in EPICS 7 Andrew Johnson via Core-talk
Re: Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk

Navigate by Date:
Prev: Re: Problem with huge waveforms in EPICS 7 Mark Rivers via Core-talk
Next: Re: Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Re: Problem with huge waveforms in EPICS 7 Mark Rivers via Core-talk
Next: Re: Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·