EPICS Re: Gateway

Experimental Physics and Industrial Control System

<2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	<2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: Gateway
From:	"Ned D. Arnold" <[email protected]>
To:	Marty Kraimer <[email protected]>
Cc:	"Kenneth Evans, Jr." <[email protected]>, Andrew Johnson <[email protected]>, [email protected], [email protected], [email protected]
Date:	Wed, 27 Nov 2002 09:14:09 -0600

Marty,

Ken has an excellent test bed for the PV Gateway (and Portable ChannelAccess Server). The PV Gateway on Hydra is used heavily and is a greattest for HEAVY LOAD conditions.

His observation of the increased CPU usage (his e-mail to Jeff on10-8-02 is included below) seems to be much more significant thantypical "resource creep". The loop rate for the R3.13 version was over100Hz. When this was stopped and the R3.14 version was started, the looprate dropped to under 10 Hz. Same hardware, same number of PV requests,same version of Solaris, different version of PCAS.

Since the response time was noticeably slower for the users, we backedout of the 3.14 version just before the user run began.

Ken and I thought this was a significant discovery that may effect manyapplications attempting to move to R3.14. Now (before the R3.14 release)is the time to be thorough and investigate whether this is a typicalcase or not. Such a performance degradation will effect numerous systemsand buying faster hardware is not always a solution.

According to recent talks at JLAB, the PCAS is used extensively and inmany situations performance is critical (imaging systems, LabViewserver, etc).



	Ned


Marty Kraimer wrote:
> At the EPICS Core Working Group meeting at JLAB the problem of running
> the gateway on 3.14 was discussed. My understanding is that when we
> build the gateway against 3.14 it uses so much cpu time that it doesn't
> work correctly. Someone mentioned that on 3.13 it already uses 75% of
> the cpu.
>
> Some questions.
>
> Is this true? Ken should know the answers. Do you have some actual
> performance numbers?
>
> If this is true then it sounds like only a matter of time until even the
> 3.13 version will fail.
>
> I assume this only applys to the gateway for ASD not the gateways for
> the CATS.
>
> For the ASD gateway can't we have another solution?
>
> Some possibilities.
>
> Run separate gatways for phoebus and oxygen.
> Get a more powerful gateway machine.
>
> Marty
>




Re: Gateway Status 10-8-02
Kenneth Evans, Jr. wrote:

Jeff,

     We have been running the latest Gateway 2.0 built with Base 3.14 on
Hydra as our main Gateway since Oct. 1.  This is the version that doesn't
print the many errlog messages, though there are quite a few left.  It
crashed (only) once on Oct. 8 with Pure virtual function called.  Otherwise,
it seems to be working properly.  It is doing what a Gateway is supposed to
do as far as I can tell.

     The problem is that it is inefficient and using too much CPU.  The
Gateway CPU has consistently been at around 95%, and the loop rate has been
just above 10 Hz, the limit if ca_poll() is to be called once every 100 ms.
This is on a 440 MHz UltraSparc-IIi with 1 processor.  It is "on the edge".
There are complaints of slow response, and if you try to do anything on
Hydra, the response is slow (as would be expected for a machine using 100%
CPU).  We did not feel we could continue to run it, as user operations
recommence tomorrow.

     The attached StripTool plot shows what happened when we changed back to
1.3.3.4, the latest Gateway 1.3 version.  The CPU goes down and the loop
rate goes up.  It is no longer "on the edge" and has quite a bit of
headroom.  The graph to the left of where it was changed is typical of the
load over the last week.  I have been watching it, and it has pretty much
looked like that during the whole period.  It is now handling the same load
but using fewer resources.  Note that the loop rate is now over 100 Hz.
fdMamager is called with a 10 ms timeout, so this means fdManager is
returning early.  (The loop consists of calls to fdManager, then ca_poll,
then Gateway stuff).  Note that both versions are "keeping up" in that the
ServerEventRate is equal to the ServerPostRate.  The threshold where this no
longer happens is much higher.

     It now runs on Linux and the behavior, while better, seems commensurate
given that the Linux box is 2 Pentium III's at 930 Mhz each.  It also runs
on Windows, but the performance seems much worse there (even though it is 1
Pentium at 800 Mhz.)  It appears to use very little CPU on WIN32, even when
loaded.  It just stops "keeping up".  In addition,  the threshold for
"keeping up" is lower than for the other two.  That is, it doesn't seem to
be utilizing the available CPU.

     It needs to be fixed before we can use it for production.

	-Ken


------------------------------------------------------------------------

Navigate by Date:: Prev: Re: base max thread priority Eric Norum; Next: Re: base max thread priority Marty Kraimer; Index: <2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: RE: 3.14 Gateway Performance Kenneth Evans, Jr.; Next: [no subject] Jeff Hill; Index: <2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025