Argonne National Laboratory

Experimental Physics and
Industrial Control System

<20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  Index <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022 
<== Date ==> <== Thread ==>

Subject: RE: 3.14 Gateway Performance
From: "Kenneth Evans, Jr." <evans@aps.anl.gov>
To: "Marty Kraimer" <mrk@aps.anl.gov>, "Ned Arnold" <nda@aps.anl.gov>
Cc: "Johnson, Andrew N." <anj@aps.anl.gov>, "Ralph Lange" <Ralph.Lange@mail.bessy.de>
Date: Mon, 2 Dec 2002 12:04:38 -0600
Marty,
 
     Replies below.
 
    -Ken
-----Original Message-----
From: Marty Kraimer [mailto:mrk@aps.anl.gov]
Sent: Wednesday, November 27, 2002 7:48 AM
To: Ned Arnold; evans@aps.anl.gov
Cc: Johnson, Andrew N.; Marty Kraimer; Ralph Lange
Subject: Gateway

At the EPICS Core Working Group meeting at JLAB the problem of running the
gateway on 3.14 was discussed. My understanding is that when we build the
gateway against 3.14 it uses so much cpu time that it doesn't work correctly.
Someone mentioned that on 3.13 it already uses 75% of the cpu.
[KE] It works correctly.  It uses more CPU than the old one.  Possibly up to a factor of 2 more. 
Some questions.

Is this true? Ken should know the answers. Do you have some actual performance
numbers?
[KE] The only real-life test is for our Hydra Gateway, It typically uses about 75% CPU with the old version and the CPU is  saturated at about 95% with the new one.  All other programs tend to be sluggish when it is running, and the machine is too close to the limit.  It ran for a week or two during shutdown and crashed only once, compared to much more frequently for the old one.  It also has some useful new features and bug fixes.  We (including Ned) decided to not use it further.  The performance is, of course, time dependent.  I have quite a lot of StripTool plots if you want more information.  The dramatic ones are the ones taken when we changed versions.  Only the CPU changes.

[KE] This could all be fixed by a faster machine, but replacing all our Gateway machines would be expensive. There is no reason 3.14 should be using more CPU.  It is a problem that should be fixed now.  This is the last (probably) of a series of problems with CAS in 3.14.  Jeff has fixed the others as the Gateway has uncovered them.  The long time spent getting the 3.14 Gateway running was not from the conversion, which was done in a few days, but in fixing the sucession of problems in CAS for 3.14.  I trust Jeff can fix this one, too, if he works on it.

If this is true then it sounds like only a matter of time until even the 3.13
version will fail.
[KE] It's been doing OK for some time.  The load doesn't appear to be increasing much with time.  You should verify this with Marty Smith. 

I assume this only applys to the gateway for ASD not the gateways for the CATS.
[KE] I would guess Hydra is the most used.  Some of the CATs probably have a light load.  Marty Smith or Mohan would be the person to contact about this.  All the stats are available from http://www.aps4.anl.gov/user_operations/index.html.

For the ASD gateway can't we have another solution?
[KE] Yes, we can buy a faster computer, perhaps even a Linux one.  Saturn does much better than Hydra in my tests.  But the problem probably affects all portable servers.  It is possibly actually in CA as the routine using all the timing is in tcp_recv_thread in CA.  Hence it may affect CA clients.  It may be owing to thread scheduling, which is used in CA but not in CAS, according to my understanding.  Why would we not investigate and try to fix it?

[KE] A long-standing problem with the Gateway is that it first calls fdManager then ca_poll.  Neither quits if it has work to do. Hence the other one cannot get time to empty its queues, etc., when one is running.    Filled queues affect the one running.  And the problem compounds and accelerates.  It would be nice to multiplex them in some way as a long-term solution.

Some possibilities.

Run separate gatways for phoebus and oxygen.
Get a more powerful gateway machine. 

Marty


References:
Gateway Marty Kraimer

Navigate by Date:
Prev: Re: base max thread priority Andrew Johnson
Next: POSIX recursive mutex Marty Kraimer
Index: <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022 
Navigate by Thread:
Prev: Gateway Marty Kraimer
Next: Re: Gateway Ned D. Arnold
Index: <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·