At the EPICS Core Working Group meeting at JLAB the problem of
gateway on 3.14 was discussed. My understanding is that when we
gateway against 3.14 it uses so much cpu time that it doesn't
Someone mentioned that on 3.13 it already uses 75% of the
[KE] It works correctly. It uses more CPU than the
old one. Possibly up to a factor of 2 more.
Is this true? Ken should know the answers. Do you have some
[KE] The only real-life test is for our Hydra
Gateway, It typically uses about 75% CPU with the old version and the CPU is
saturated at about 95% with the new one. All other programs tend
to be sluggish when it is running, and the machine is too close to the
limit. It ran for a week or two during shutdown and crashed only once,
compared to much more frequently for the old one. It also has some
useful new features and bug fixes. We (including Ned) decided to not use
it further. The performance is, of course, time dependent. I
have quite a lot of StripTool plots if you want more information. The
dramatic ones are the ones taken when we changed versions. Only the CPU
[KE] This could
all be fixed by a faster machine, but replacing all our Gateway machines would
be expensive. There is no reason 3.14 should be using more CPU. It
is a problem that should be fixed now. This is the last (probably) of a
series of problems with CAS in 3.14. Jeff has fixed the others as the
Gateway has uncovered them. The long time spent getting the 3.14 Gateway
running was not from the conversion, which was done in a few days, but in
fixing the sucession of problems in CAS for 3.14. I trust Jeff can fix
this one, too, if he works on it.
If this is true then it
sounds like only a matter of time until even the 3.13
[KE] It's been doing OK for some time. The load
doesn't appear to be increasing much with time. You should verify
this with Marty Smith.
I assume this only applys to
the gateway for ASD not the gateways for the CATS.
[KE] I would
guess Hydra is the most used. Some of the CATs probably have a light
load. Marty Smith or Mohan would be the person to contact about
this. All the stats are available from http://www.aps4.anl.gov/user_operations/index.html.
the ASD gateway can't we have another solution?
[KE] Yes, we can
buy a faster computer, perhaps even a Linux one. Saturn does much better
than Hydra in my tests. But the problem probably affects all portable
servers. It is possibly actually in CA as the routine using all the
timing is in tcp_recv_thread in CA. Hence it may affect CA
clients. It may be owing to thread scheduling, which is used in CA but
not in CAS, according to my understanding. Why would we not investigate
and try to fix it?
long-standing problem with the Gateway is that it first calls fdManager then
ca_poll. Neither quits if it has work to do. Hence the other one
cannot get time to empty its queues, etc., when one is
running. Filled queues affect the one running. And the
problem compounds and accelerates. It would be nice to multiplex them in
some way as a long-term solution.
Run separate gatways for phoebus and oxygen.
more powerful gateway machine.