Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
<== Date ==> <== Thread ==>

Subject: FW: Gige performance increasing.
From: Mark Rivers <rivers@cars.uchicago.edu>
To: "tech-talk@aps.anl.gov Talk" <tech-talk@aps.anl.gov>
Date: Sat, 20 Apr 2013 14:37:30 +0000
Folks,

I apologize for the previous post, the tech-talk mailer converts to plain text so the table was lost.  I have now attached it as a PDF, hopefully this will work.

I am forwarding a message reporting some interesting problems running multiple Prosilica cameras on a single machine.  

Suggestions are most appreciated!

Mark


From: Mark Rivers
Sent: Saturday, April 20, 2013 9:07 AM
To: Slava Isaev
Cc: Matjaz Kobal; Spencer J. Gessner; Matthew Boyes; Luciano Piccoli; Williams Jr., Ernest L.; Andrew Johnson
Subject: RE: Gige performance increasing.

Folks,

Yesterday I set up a system in my lab to try to reproduce the results that Matjaz presented in the report he sent me on Nov. 7, 2012.   I have attached the complete report.  The following is Table 1 from the report.

He was testing on a Linux system with 8 network cards (dedicated card per camera) and dual 6-core CPUs (12 cores total).  Each camera ran in its own IOC, so in its own process on Linux.

Note that with 3 cameras running at 10 Hz and all 3 plugins (JPEG, Process, and Analyze) running he observed 80% usage of a single CPU, and 340% usage of the total CPUs.  There was 15% frame loss between the cameras and the computer under these conditions.

The system I have available for testing has a single GigE network card and only 4 cores.  It is a dual-boot system with Fedora Core 14 and Windows 7 64-bit.

I first tested on Fedora.  I ran tests for each camera to collect 1000 frames, which took 100 seconds at 10 frames/sec.  Running 3 cameras at 10 Hz I observed results similar to what Matjaz did.  It was dropping less than 0.1% of the frames when I had no plugins enabled, and more than 10% of the frames when I enabled the JPEG and Statistics plugins on each camera, which was putting more than 80% load on each CPU.  I will get some more precise numbers next week before I come, but I believe I am effectively seeing results similar to Matjaz.

However, I then booted the system with Windows 7 and conducted the identical tests.  With all 3 cameras running at 10 Hz, and the JPEG and Statistics plugins running, Windows was reporting over 90% CPU utilization.  Windows task manager reports the %CPU utilization such that 100% means that all cores are saturated, unlike Linux which reports NCores*100% when all cores are saturated.  Thus the system was very close to fully CPU saturated.  Under these conditions the cameras did not drop a SINGLE frame! This was 0 dropped frames out of 3000 total, so less than 0.03%, compared to 15% dropped frames that Matjaz measured under almost identical conditions on a much more powerful Linux server with dedicated Ethernet port per camera and 3 times more cores.  The plugins also did not drop any frames, although I was monitoring the free queue size in each plugin, and they occasionally came close to depleting the queue and dropping frames.  That is exactly what I expect as the CPUs approach saturation.

So my conclusion is that whatever is causing the dropped frames is not really a problem with the areaDetector driver or architecture, but is something specific to the Linux Ethernet driver or perhaps the Linux AVT driver library.


IMPORTANT NOTE:

I tried to automate my testing by writing an IDL script to turn the cameras on, wait for them to get done, and then read the statistics on the dropped frames.  This was using the normal IDL channel access library.  I observed VERY WEIRD behavior which I do not understand at all.  Here is what I observe:

- If I start each camera acquiring by using medm to set the Acquire PV to 1 then it almost always works fine.  I press Acquire on camera 1, then quickly press Acquire on camera 2, and then camera 3.  If I do that with all plugins disabled, then each cameras starts acquiring at 10 frames/sec with essentially no dropped frames.
- However, if I do the "identical" operation using IDL to set the Acquire PV to 1 on each camera in succession here is what I see:
  - If I only start 2 cameras, rather than 3 it works fine.  Both cameras acquire at 10 Hz with no dropped frames.
  - If I start 3 cameras with say a 2 second delay between starting each one (to simulate my delay when using medm to do it) then the first 2 cameras begin acquiring at 10 Hz.  But as soon as the third camera is started all 3 cameras begin dropping MORE THAN 90% of their frames!
  - This behavior of dropping 90% of frames when camera 3 starts happens no matter what delay (0.1 to 5 seconds) I put between starting the next camera.
  - I see  the identical behavior on Linux and Windows.

IDL and medm are both running on another Linux machine, not the machine running the camera IOCs, so these are channel access put operations from the same remote machine.

I am totally baffled by this.  Why does it make a difference if I start the cameras with medm or IDL?  They should both result in similar channel access put operations.  Furthermore, what can the IDL put operation be doing that causes the cameras to suddenly begin to drop 90% of their frames?

I see one other behavior that I don't understand.  When I use the "caput" program from EPICS base (3.14.12.3) to write to any PV in the camera IOC I see about a 2 second delay before the caput completes:

corvette:~>date ; /usr/local/epics/base-3.14.12.3/bin/linux-x86/caput 13PS1:cam1:Gain.DESC "Test" ; date
Sat Apr 20 08:58:54 CDT 2013
Old : 13PS1:cam1:Gain.DESC           Test
New : 13PS1:cam1:Gain.DESC           Test
Sat Apr 20 08:58:56 CDT 2013

Note that "date" is reporting that this operation took about 2 seconds.  There is a very noticeable delay between when the "New" value of the PV is printed, and when the Linux shell prompt returns.  Why?  This happens when all 3 cameras are not acquiring, so it cannot be a problem with Ethernet loading.  It happens whether the camera IOCs are running on Windows or Linux.

If I write to a PV in a vxWorks IOC I do not see this delay,  the Linux prompt returns "immediately" with no perceptible delay.

corvette:~>date ; /usr/local/epics/base-3.14.12.3/bin/linux-x86/caput 13LAB:m1.DESC "Test" ; date
Sat Apr 20 08:59:09 CDT 2013
Old : 13LAB:m1.DESC                  test
New : 13LAB:m1.DESC                  Test
Sat Apr 20 08:59:09 CDT 2013

I wonder if this could be related to the problem I am seeing with IDL starting the cameras?

Cheers,
Mark

Attachment: Table1.pdf
Description: Table1.pdf


Replies:
RE: Gige performance increasing. Ahed Aladwan
Re: FW: Gige performance increasing. Andrew Johnson
References:
FW: Gige performance increasing. Mark Rivers

Navigate by Date:
Prev: FW: Gige performance increasing. Mark Rivers
Next: RE: Gige performance increasing. Ahed Aladwan
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
Navigate by Thread:
Prev: FW: Gige performance increasing. Mark Rivers
Next: RE: Gige performance increasing. Ahed Aladwan
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·