EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  2025  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  2025 
<== Date ==> <== Thread ==>

Subject: Re: [EXTERNAL] Re: PVA monitor request parameters
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: "Pearson, Matthew" <pearsonmr at ornl.gov>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Thu, 30 Jun 2022 10:53:13 -0700
On 6/29/22 10:32, Pearson, Matthew wrote:
Hi,

Reporting back on my PVA queueSize testing with areaDetector PVA plugin and pvaDriver.

As I mentioned, on my underpowered VM (2 cores, 16GB RAM), I was seeing problems at low frame rates and small image sizes.

For example, with the default queueSize, generating UInt8 128x128 images at 100Hz I start to lose frames on the other side (the pvaDriver side). Generating 6000 frames, and only received 5994. It gets worse at higher frame rates. At 650Hz (the maximum I could generate those images at) and generating 120000 images, I only received 115804 on the pvaDriver side, so I lost 3.5%.

For images sized at UInt32, 4096x4096, at 20Hz, I generated 1200 and only received 917, so I lost 23%. That increases to 50% lost at higher frame rates.

Once I set the queueSize to 100, I see much better performance. Whereas before I was losing frames at 100Hz for UInt8 128x128 data, I could now run at 650Hz with no problem, except for occasionally dropping frames on the simulation size due to full NDArray queues.

For bigger images (UInt32, 4096x4096), I could run at slightly higher frame rates without losing data, but at 100Hz my VM ran out of memory and the IOC was killed by the OS.

I also ran the tests on a much more capable server (8 cores, 32GB RAM):

Using the default queueSize I only saw dropped frames when generating 4096x4096 images at more than 50Hz (for UInt8, UInt16 and UInt32 data). I typically lost about 15-17% frames. Once I increased the queueSize to 100 I still lost frames, but lost fewer of them (only 12-15%). Although for the UInt8 data I didn't lose any frames, whereas I did lose some with the default queueSize.

So, I'm not sure these results are telling us much except that increasing the queueSize helps a little in cases where the machine is heavily loaded or is underpowered for the application.

Without more specific information, I think this is the case.

Did you look at CPU and memory usage while running these tests?

(both on the guest VM and host)

Looking at per-thread CPU usage ('top -H') should give starting
points.  With Linux, a one handy non-invasive tool is "perf",
which can find hot spots in code.  eg.

sudo perf top -g --call-graph dwarf -H <pid>

I've found this to be a useful guide towards targeted optimization.

(perf benefits from, but does not require debug info)


I also tried using pipeline=true (with the default queueSize) and got strange results. On the receiving side (the pvaDriver side) I only got an image for every 2nd image generated on the simulation side, and it was two images received in quick succession. I see the same effect using 'pvget' in verbose mode:

pvget -vvv -m -r "record[pipeline=true]field()" ST99:Det:Det2:PV1:Array

The printed data is the same for each update. In the first pvget printout it does highlight the changed fields, but then in the subsequent printouts they are not highlighted.

When I also added in the larger queueSize:
pvget -vvv -m -r "record[pipeline=true,queueSize=100]field()" ST99:Det:Det2:PV1:Array

I do seem to get around 99 images sent out for every image generated in the simulation and exported by the PVA plugin.

So I didn't do much further testing with pipeline=true. Although I did test it with an NTScalar PV and I only ever got 1 monitor update, regardless of the queueSize setting or how many times the NTScalar changed on the server side.

Ok...  I spent some time yesterday convincing myself that
"pipeline=true" works when the Monitor implementation in
pvAccessCPP is used (eg. with QSRV).

So I suspect a bug in pvDatabaseCPP and have opened a ticket.

https://github.com/epics-base/pvDatabaseCPP/issues/78

I'll try to find time to look what would be required to port AD
to use PVXS (which works correctly with "pipeline=true").


Cheers,
Matt


-----Original Message-----
From: Tech-talk <tech-talk-bounces at aps.anl.gov> On Behalf Of Pearson, Matthew via Tech-talk
Sent: Wednesday, June 22, 2022 5:04 PM
To: Michael Davidsaver <mdavidsaver at gmail.com>
Cc: tech-talk at aps.anl.gov
Subject: RE: [EXTERNAL] Re: PVA monitor request parameters

Hi Michael,
> I’m seeing poor performance using the default PV request type on the receive side:

	Could you quantify what "poor performance" means?  Are you only looking at dropped updates?  Do you	 have any observations wrt. CPU and/or network load?

Yes, apologies, I think I rushed the e-mail. I'm currently running a script and doing more detailed tests so I'll be able to give a lot more details in a few days.

This is using base 7.0.6.1.

The VM is fairly underpowered and only has 2 cores allocated to it, and several other applications are running on it, although the CPU is relatively idle and less than 20% used. I was generating 0.5MB sized images at 100Hz, so about 50MB/s. Since both IOCs are running on the same VM this is going over the loopback interface.

I see dropped images (a few %) by generating a fixed number of images on the 'source' side and then comparing how many were received on the other side.

I suspect since the VM has other applications running, the IOC is sometimes suspended to allow the other programs to run, and it can't resume quick enough within the necessary 0.02 seconds (which would be the time length of the buffer if the queueSize=2). However, I did check the 'nonvoluntary_ctxt_switches' and that wasn't increasing.

	fyi. without "pipeline=true" there is no guarantee that clients won't drop updates.  So unrelated changes to 	system and/or network load may change things.

That makes sense. As well as queueSize=100 I will run the detailed testing for pipeline=true, and I'll report back.

	Yes, absolutely.  With "pipeline=true", queueSize sets the flow control window size, which is analogous to 	the TCP window size, though applied to individual subscriptions.

If pipeline=true, and queueSize sets the flow control window size on the server, does the client side buffer size remain at 2 or is it also set to queueSize?

	> I saw strange results with the “record[queueSize=100, pipeline=true]field()” where I was sending data at 	1Hz but receiving at 100Hz or so.

	Do you subscribe to a PV post()ing updates at 1Hz and somehow get 99 extra updates?

That's what it seemed like, but I'd like to do more detailed testing next.

Thanks for the detailed explanation. I'll continue testing and report back, and I'd also like to test on a more powerful machine.

Cheers,
Matt




References:
PVA monitor request parameters Pearson, Matthew via Tech-talk
Re: PVA monitor request parameters Michael Davidsaver via Tech-talk
RE: [EXTERNAL] Re: PVA monitor request parameters Pearson, Matthew via Tech-talk
RE: [EXTERNAL] Re: PVA monitor request parameters Pearson, Matthew via Tech-talk

Navigate by Date:
Prev: Re: Doubts about DISS field Andrew Johnson via Tech-talk
Next: Re: Timeout when running soft IOC as root Siddons, David via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  2025 
Navigate by Thread:
Prev: RE: [EXTERNAL] Re: PVA monitor request parameters Pearson, Matthew via Tech-talk
Next: phoebus and subarrays in displays Pierrick M Hanlet via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  2025 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·