EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Problem with huge waveforms in EPICS 7
From: Mark Rivers via Core-talk <core-talk at aps.anl.gov>
To: "Zimoch Dirk (PSI)" <dirk.zimoch at psi.ch>
Cc: "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>
Date: Sat, 25 Jun 2022 12:53:38 +0000
Dirk,

Sorry that I sent my original replies to tech-talk, not core-talk.  With the changes to the ANL mailers "Reply All" no longer puts the list address in the CC, so I have to type it manually, and did not notice that it was from core-talk, not tech-talk.

In your tests what client was monitoring the waveform record?  Was it just camonitor?

I tried to reproduce your test exactly, but DTYP=sequence is not part of base.  I guess it does not matter what is in the array, since the waveform record always posts arrays, it does not check for new values, right?

Mark



From: Mark Rivers
Sent: Friday, June 24, 2022 1:01 PM
To: Zimoch Dirk (PSI) <dirk.zimoch at psi.ch>
Cc: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: RE: Problem with huge waveforms in EPICS 7
 

I realized something I said was not quite accurate:

 

- ImageJ EPICS_AD_Viewer running on a Windows machine with a 1Gbit Ethernet connection.  It subscribes to the entire array, so the image size is 32MB.  It receives about 2.5 frames/s, which is approaching saturation of the network.

 

Actually EPICS_AD_Viewer does not subscribe to monitors on the waveform record.  Rather it subscribes to a monitor on the ArrayCounter_RBV PV.  When that changes it does a caget of the image dimensions to determine the number of array elements, and then does a caget to read the appropriate number of elements from the waveform record. 

 

Mark

 

 

 

From: Mark Rivers
Sent: Friday, June 24, 2022 12:19 PM
To: Zimoch Dirk (PSI) <dirk.zimoch at psi.ch>
Cc: tech-talk at aps.anl.gov
Subject: RE: Problem with huge waveforms in EPICS 7

 

Hi Dirk,

 

I just tried to reproduce your problem on our Centos 7 system running base 7.0.6.1 using the ADSimDetector.

 

I configured the simDetector for 4096x4096 images, UInt16 data type, AcquireTime=0.01.  It thus generates about 100 frames/s.

 

The NDStdArrays plugin is serving those arrays over Channel Access via waveform records of type SHORT and 20000000 elements.  EPICS_CA_MAX_ARRAY_BYTES is set to at least 40000000 on both the IOC and the clients.  The NDStdArrays was set to QueueSize of 1 so it was only buffering one NDArray.

 

I have 3 channel access clients receiving those arrays:

- ImageJ EPICS_AD_Viewer running on a Windows machine with a 1Gbit Ethernet connection.  It subscribes to the entire array, so the image size is 32MB.  It receives about 2.5 frames/s, which is approaching saturation of the network.

- camonitor running on the same Windows machine as ImageJ, subscribing to just the first 10 elements of the array.

- camonitor running on the same Linux machine as the IOC, subscribing to just the first 10 elements of the array.

 

This is the output of camonitor on the local Linux machine.

 

corvette:~/devel/dxp>camonitor -tci -#10 13SIM1:image1:ArrayData 13SIM1:image1:UniqueId_RBV 13SIM1:cam1:Acquire

13SIM1:image1:ArrayData (2022-06-24 11:44:51.450659) 10 -20149 -19894 -19639 -19384 -19129 -18874 -18619 -18364 -18109 -17854

13SIM1:image1:UniqueId_RBV     (2022-06-24 11:44:51.450779) 1 950

13SIM1:cam1:Acquire            (2022-06-24 11:44:51.450795) 1 Done

13SIM1:cam1:Acquire                          (   +6.359077) 1 Acquire

13SIM1:image1:UniqueId_RBV                   (   +0.023125) 1 1

13SIM1:image1:ArrayData               (   +0.005937) 10 -19639 -19384 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344

13SIM1:image1:UniqueId_RBV                   (   +0.005499) 1 2

13SIM1:image1:ArrayData               (   +0.014410) 10 -19384 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344 -17089

13SIM1:image1:UniqueId_RBV                   (   +0.009681) 1 3

13SIM1:image1:ArrayData               (   +0.008291) 10 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344 -17089 -16834

13SIM1:image1:UniqueId_RBV                   (   +0.002308) 1 5

13SIM1:image1:ArrayData               (   +0.013805) 10 -18619 -18364 -18109 -17854 -17599 -17344 -17089 -16834 -16579 -16324

13SIM1:image1:UniqueId_RBV                   (   +0.007945) 1 6

13SIM1:image1:ArrayData               (   +0.008092) 10 -18364 -18109 -17854 -17599 -17344 -17089 -16834 -16579 -16324 -16069

13SIM1:image1:UniqueId_RBV                   (   +0.002052) 1 8

13SIM1:image1:ArrayData               (   +0.013508) 10 -17854 -17599 -17344 -17089 -16834 -16579 -16324 -16069 -15814 -15559

13SIM1:image1:UniqueId_RBV                   (   +0.007415) 1 9

13SIM1:image1:ArrayData               (   +0.008839) 10 -17599 -17344 -17089 -16834 -16579 -16324 -16069 -15814 -15559 -15304

13SIM1:image1:UniqueId_RBV                   (   +0.001408) 1 11

13SIM1:image1:ArrayData               (   +0.014345) 10 -17089 -16834 -16579 -16324 -16069 -15814 -15559 -15304 -15049 -14794

13SIM1:image1:UniqueId_RBV                   (   +0.006622) 1 12

13SIM1:image1:ArrayData               (   +0.009314) 10 -16834 -16579 -16324 -16069 -15814 -15559 -15304 -15049 -14794 -14539

13SIM1:image1:UniqueId_RBV                   (   +0.001957) 1 14

...

13SIM1:image1:ArrayData               (   +0.021815) 10 -4 251 506 761 1016 1271 1526 1781 2036 2291

13SIM1:image1:UniqueId_RBV                   (   +0.002626) 1 79

13SIM1:image1:ArrayData               (   +0.006381) 10 251 506 761 1016 1271 1526 1781 2036 2291 2546

13SIM1:cam1:Acquire                          (   +0.005208) 1 Done

13SIM1:image1:UniqueId_RBV                   (   +0.000070) 1 81

13SIM1:image1:ArrayData               (   +0.010690) 10 761 1016 1271 1526 1781 2036 2291 2546 2801 3056

13SIM1:image1:ArrayData               (   +0.011467) 10 1016 1271 1526 1781 2036 2291 2546 2801 3056 3311

13SIM1:image1:UniqueId_RBV                   (   +0.000044) 1 82

 

camonitor is not receiving about 50% of the arrays, as seen by the UniqueID.  This is because the NDPluginStdArrays cannot keep up with the 100 Hz image rate.

 

The camonitor on Acquire=Done happened “instantly” when I pressed the Stop button in ADSimDetector.  I received 2 array monitors after that, the last about 20 ms after the Done monitor.  One of those is due to the QueueSize in NDPluginStdArrays, because the plugin will continue to send arrays from its queue even after Stop is pressed.

 

This is the output of camonitor on the remote Windows machine.

 

J:\epics\devel\mca\iocBoot\iocWindows>camonitor -tci -#10 13SIM1:image1:ArrayData 13SIM1:image1:UniqueId_RBV 13SIM1:cam1:Acquire

13SIM1:image1:ArrayData (2022-06-24 11:44:46.171910) 10 -20149 -19894 -19639 -19384 -19129 -18874 -18619 -18364 -18109 -17854

13SIM1:image1:UniqueId_RBV     (2022-06-24 11:44:46.181375) 1 950

13SIM1:cam1:Acquire            (2022-06-24 11:44:46.186656) 1 Done

13SIM1:cam1:Acquire                          (  +11.624939) 1 Acquire

13SIM1:image1:UniqueId_RBV                   (   +0.023040) 1 1

13SIM1:image1:ArrayData               (   +0.005702) 10 -19639 -19384 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344

13SIM1:image1:UniqueId_RBV                   (   +0.005528) 1 2

13SIM1:image1:ArrayData               (   +0.014574) 10 -19384 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344 -17089

13SIM1:image1:UniqueId_RBV                   (   +0.009594) 1 3

13SIM1:image1:ArrayData               (   +0.008185) 10 -19129 -18874 -18619 -18364 -18109 -17854 -17599 -17344 -17089 -16834

13SIM1:image1:UniqueId_RBV                   (   +0.005337) 1 5

13SIM1:image1:ArrayData               (   +0.011025) 10 -18619 -18364 -18109 -17854 -17599 -17344 -17089 -16834 -16579 -16324

13SIM1:image1:UniqueId_RBV                   (   +0.007906) 1 6

13SIM1:image1:ArrayData               (   +0.008129) 10 -18364 -18109 -17854 -17599 -17344 -17089 -16834 -16579 -16324 -16069

13SIM1:image1:UniqueId_RBV                   (   +0.004262) 1 8

13SIM1:image1:ArrayData               (   +0.011267) 10 -17854 -17599 -17344 -17089 -16834 -16579 -16324 -16069 -15814 -15559

13SIM1:image1:UniqueId_RBV                   (   +0.007399) 1 9

13SIM1:image1:ArrayData               (   +0.008771) 10 -17599 -17344 -17089 -16834 -16579 -16324 -16069 -15814 -15559 -15304

13SIM1:image1:UniqueId_RBV                   (   +0.005464) 1 11

13SIM1:image1:ArrayData               (   +0.010426) 10 -17089 -16834 -16579 -16324 -16069 -15814 -15559 -15304 -15049 -14794

13SIM1:image1:UniqueId_RBV                   (   +0.006467) 1 12

13SIM1:image1:ArrayData               (   +0.009346) 10 -16834 -16579 -16324 -16069 -15814 -15559 -15304 -15049 -14794 -14539

13SIM1:image1:UniqueId_RBV                   (   +0.004680) 1 14

13SIM1:image1:ArrayData               (   +0.011311) 10 -16324 -16069 -15814 -15559 -15304 -15049 -14794 -14539 -14284 -14029

...

13SIM1:image1:ArrayData               (   +0.010831) 10 -769 -514 -259 -4 251 506 761 1016 1271 1526

13SIM1:image1:UniqueId_RBV                   (   +0.009175) 1 76

13SIM1:image1:ArrayData               (   +0.006923) 10 -514 -259 -4 251 506 761 1016 1271 1526 1781

13SIM1:image1:UniqueId_RBV                   (   +0.004014) 1 78

13SIM1:image1:ArrayData               (   +0.021792) 10 -4 251 506 761 1016 1271 1526 1781 2036 2291

13SIM1:image1:UniqueId_RBV                   (   +0.003317) 1 79

13SIM1:image1:ArrayData               (   +0.005633) 10 251 506 761 1016 1271 1526 1781 2036 2291 2546

13SIM1:cam1:Acquire                          (   +0.005191) 1 Done

13SIM1:image1:UniqueId_RBV                   (   +0.002417) 1 81

13SIM1:image1:ArrayData               (   +0.022923) 10 761 1016 1271 1526 1781 2036 2291 2546 2801 3056

13SIM1:image1:ArrayData               (   +0.003682) 10 1016 1271 1526 1781 2036 2291 2546 2801 3056 3311

13SIM1:image1:UniqueId_RBV                   (   +0.004607) 1 82

 

The Windows machine camonitor output is very similar to Linux.  There are only 2 arrays received after Done, also about 20 ms later.

 

So I am not reproducing your issue of a large queue of arrays being received after Done.

 

I am not running priority thread scheduling, OSSPRI is 0 for all of the IOC threads.

 

Mark

 

 

-----Original Message-----
From: Core-talk <core-talk-bounces at aps.anl.gov> On Behalf Of Zimoch Dirk (PSI) via Core-talk
Sent: Friday, June 24, 2022 10:39 AM
To: core-talk at aps.anl.gov
Subject: Problem with huge waveforms in EPICS 7

 

Hi folks,

 

Some of or users complained that a camera server became less responsive since it had been upgraded from EPICS 3.14.12.6 to 7.0.6.1.

 

The camera sends image data as arrays of 20000000 SHORTs (5000x4000 pixels). When the user presses the "STOP" button on the client which displays the image, it takes a long time to stop. The more active clients, the longer it takes.

But even sending stop from a different client (e.g. command line caput) takes a long time before the GUI clients update.

 

I have set up a simple simulation and run it with 'var CADEBUG 3'

Here is what I see: on EPICS 7.0.6.1

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: TCP Request from 129.129.130.117:47142 => cmmd=4 (CA_PROTO_WRITE) cid=0x4 type=0 count=1 postsize=8 version=13

CAS: Request from 129.129.130.117:47142 =>   available=0x2      N=1 paddr=0x7efcb800db80

CAS: Request from 129.129.130.117:47142 =>   Wrote string "STOP"

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

[>80 times the same!]

CAS: Sending a message of 40000056 bytes <---- I think this one contains the update of the STOP button

CAS: Sending a message of 40000032 bytes [eventually stops many seconds later]

 

The IOC obviously gets the STOP message immediately when I press the button on the client. But the client (and any other client showing the image) does not see the button change. The GUI appears "frozen". But a command line camonitor monitoring the stop button (and a counter that counts the number of created images but not the image itself) show that the records stop immediately.

Nevertheless the IOC keeps sending images. But the images do not change any more on the clients. So it seems that the IOC keeps sending the same array data over and over again.

 

On 3.14.12, the output looks similar, but the "send after stop" consists of only a few messages:

CAS: Request from 129.129.130.117:47184 => cmmd=4 cid=0x1 type=0 count=1 postsize=8

CAS: Request from 129.129.130.117:47184 =>   available=0x2      N=1 paddr=0x7f0768010b28

CAS: Request from 129.129.130.117:47184 =>   Wrote string "STOP"

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000032 bytes

CAS: Sending a message of 40000056 bytes <---- update of the STOP button

 

What can be wrong here?

The IOC consists of a counting calc, a bo for the stop switch and a waveform record with a driver that simply fills the waveform with a sequence starting at the counter value. Nothing fancy.

 

Here is my db:

 

record (waveform, "DZ:BIGARRAY")

{

    field(FTVL, "SHORT")

    field(NELM, "20000000")

    field(DTYP, "sequence")

    field(SCAN, ".1 second")

    field(SDIS, "DZ:STOP")

    field(INP,  "DZ:COUNT")

    field(FLNK, "DZ:COUNT")

}

 

record (calc, "DZ:COUNT")

{

    field(CALC, "VAL+1")

}

 

record(bo, "DZ:STOP")

{

    field(ZNAM,"GO")

    field(ONAM,"STOP")

}

 

I suspect this happens when record produces new waveforms faster than they can be sent.

The IOC has no problem processing the waveform at 10 Hz, but I see only about 3 CAS messages per second.

I had to slow down the waveform processing to ".5 second" to improves responsiveness. That is when the monitor updates can be sent as quickly as being produced. But opening a second client again spoils everything.

 

Dirk

 

 


References:
Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk

Navigate by Date:
Prev: Re: Problem with huge waveforms in EPICS 7 Zimoch Dirk (PSI) via Core-talk
Next: RE: Problem with huge waveforms in EPICS 7 Mark Rivers via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Re: Problem with huge waveforms in EPICS 7 Michael Davidsaver via Core-talk
Next: [Bug 1424092] Re: epicsAssert() message lost mdavidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·