Hi Andrew,
The cameras run on Windows. I did my test on Linux but not as root, thus I had no RT scheduling. I will repeat the test on Monday running as root.
The STOP message gets processed in time! A client that does not monitor the array sees the change immediately. The counter stops. But CA keeps sending!
I had expected that the IOC would drop frames if CA cannot send fast enough. Not trying for minutes to work through a pile of unsent frames. And then not even sending updates but simply repeating the last frame.
Am 24.06.2022 um 18:10 schrieb Andrew Johnson via Core-talk <core-talk at aps.anl.gov>:
Hi Dirk,
What OS is the IOC running on — I'm guessing Linux but you didn't say. If so is it built for and using priority thread scheduling? If the OSSPRI field from epicsThreadShowAll is all zeros it isn't, and
enabling that might help. The normal Linux scheduler tends to maximize throughput, not fairness, so it could be delaying the threads which process your STOP message while the threads handling image data can continue to make progress. However this is just
a guess.
- Andrew
General musings: The setpriority(2) manpage on RHEL-7 says:
BUGS
According to POSIX, the nice value is a per-process setting. However, under
the current Linux/NPTL implementation of POSIX threads, the nice value is a
per-thread attribute: different threads in the same process can have different
nice values. Portable applications should avoid relying on the Linux behav‐
ior, which may be made standards conformant in the future.
I wonder whether we should look at setting nice values for Linux threads when the process doesn't have the ability to use SCHED_FIFO?
On 6/24/22 10:39 AM, Zimoch Dirk (PSI) via Core-talk wrote:
Hi folks,
Some of or users complained that a camera server became less responsive since it had been upgraded from EPICS 3.14.12.6
to 7.0.6.1.
The camera sends image data as arrays of 20000000 SHORTs (5000x4000 pixels). When the user presses the "STOP" button on
the client which displays the image, it takes a long time to stop. The more active clients, the longer it takes.
But even sending stop from a different client (e.g. command line caput) takes a long time before the GUI clients update.
I have set up a simple simulation and run it with 'var CADEBUG 3'
Here is what I see: on EPICS 7.0.6.1
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: TCP Request from 129.129.130.117:47142 => cmmd=4 (CA_PROTO_WRITE) cid=0x4 type=0 count=1 postsize=8 version=13
CAS: Request from 129.129.130.117:47142 => available=0x2 N=1 paddr=0x7efcb800db80
CAS: Request from 129.129.130.117:47142 => Wrote string "STOP"
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
[>80 times the same!]
CAS: Sending a message of 40000056 bytes <---- I think this one contains the update of the STOP button
CAS: Sending a message of 40000032 bytes
[eventually stops many seconds later]
The IOC obviously gets the STOP message immediately when I press the button on the client. But the client (and any other
client showing the image) does not see the button change. The GUI appears "frozen". But a command line camonitor
monitoring the stop button (and a counter that counts the number of created images but not the image itself) show that
the records stop immediately.
Nevertheless the IOC keeps sending images. But the images do not change any more on the clients. So it seems that the
IOC keeps sending the same array data over and over again.
On 3.14.12, the output looks similar, but the "send after stop" consists of only a few messages:
CAS: Request from 129.129.130.117:47184 => cmmd=4 cid=0x1 type=0 count=1 postsize=8
CAS: Request from 129.129.130.117:47184 => available=0x2 N=1 paddr=0x7f0768010b28
CAS: Request from 129.129.130.117:47184 => Wrote string "STOP"
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000032 bytes
CAS: Sending a message of 40000056 bytes <---- update of the STOP button
What can be wrong here?
The IOC consists of a counting calc, a bo for the stop switch and a waveform record with a driver that simply fills the
waveform with a sequence starting at the counter value. Nothing fancy.
Here is my db:
record (waveform, "DZ:BIGARRAY")
{
field(FTVL, "SHORT")
field(NELM, "20000000")
field(DTYP, "sequence")
field(SCAN, ".1 second")
field(SDIS, "DZ:STOP")
field(INP, "DZ:COUNT")
field(FLNK, "DZ:COUNT")
}
record (calc, "DZ:COUNT")
{
field(CALC, "VAL+1")
}
record(bo, "DZ:STOP")
{
field(ZNAM,"GO")
field(ONAM,"STOP")
}
I suspect this happens when record produces new waveforms faster than they can be sent.
The IOC has no problem processing the waveform at 10 Hz, but I see only about 3 CAS messages per second.
I had to slow down the waveform processing to ".5 second" to improves responsiveness. That is when the monitor updates
can be sent as quickly as being produced. But opening a second client again spoils everything.
Dirk
--
Complexity comes for free, Simplicity you have to work for.
|