Hi Martin and Michael,
Thanks for your quick replies. Here are some more details and some test results.
- EPICS is building a shareable library that is called from front-end code that is written in IDL.
- Much of the CPU intensive code in the library is actually computing 1-D and 2-D FFTs using libfftw3f.
- The 2013 version of the library was built with a static version of libfftw3f.a, so the shareable library does not depend on an fftw shareable library.
- I believe the 2013 version of the library was compiler built with 4.6.3-2
corvette:local/idl_user/tomography>readelf -p .comment tomoRecon_linux_x86_64.so.orig
String dump of section '.comment':
[ 0] GCC: (GNU) 4.6.1 20110908 (Red Hat 4.6.1-9)
[ 2c] GCC: (GNU) 4.6.3 20120306 (Red Hat 4.6.3-2)
libfftw3f.a was built with 4.6.1-9, based on the output of "strings libfftw3f.a"
GCC: (GNU) 4.6.1 20110908 (Red Hat 4.6.1-9)
Ø
Is this an observed increase in total runtime? CPU load?
Total runtime.
Ø
Are the slower versions compiled with -O3?
Yes.
Ø
Do you have any metric on whether individual job runtime has increased?
Not really. I have added metrics to measure and print the time in each phase of the calculation. But I can only get the information for the new builds, not the 2013
build.
I have now tested with 3 different versions of the shareable library. These tests were done on a 20-core Centos 7 system running 16 threads. "top" always showed 1600%
CPU usage for all versions.
1) 2013 version, built with base 3.14.12.3 built with static version of fftw from 2013. Based on the "strings" output this appears to be fftw 3.3.
2) New version built with base 7.0.4 and the same static version of libfftw3f.a (3.3) from 2013.
3) New version built with base 7.0.4 and the current release of fftw package with shareable library for Centos 7 (fftw 3.3.2).
Ø
* Compare executable text size. (use 'size' not 'ls')
This is the output of ls –l on those libraries where I have copied them to have names reflecting the above.
-r-xr-xr-x 1 epics domain users 268264 Jun 29 14:24 tomoRecon_linux_x86_64.so.new_share
-r-xr-xr-x 1 epics domain users 1287984 Jun 29 14:05 tomoRecon_linux_x86_64.so.new_static
-r-xr-xr-x 1 epics domain users 1425867 Jun 25 14:43 tomoRecon_linux_x86_64.so.orig
This is the output of “size” on the 2 versions that are built with the same static version of libfftw3f.a.
corvette:local/idl_user/tomography>size tomoRecon_linux_x86_64.so.new_static
text data bss dec hex filename
1145812 33792 5432 1185036 12150c tomoRecon_linux_x86_64.so.new_static
corvette:local/idl_user/tomography>size tomoRecon_linux_x86_64.so.orig
text data bss dec hex filename
1113151 34120 5112 1152383 11957f tomoRecon_linux_x86_64.so.orig
Here are the performance results:
Build
|
Brief description
|
Elapsed time (s)
|
1
|
2013 build
|
17.2
|
2
|
Current build, static FFTW 3.3
|
18.0
|
3
|
Current build, dynamic FFTW 3.3.2
|
14.2
|
My original message was comparing builds 1 and 2. In today’s tests I am actually seeing that the new build with static fftw is only about 5% slower than the old build,
rather than 10%. I can’t explain why it’s different today.
However, the really good news is that when I build with the 3.3.2 dynamic version of libfftw3f.so the new build is actually ~20% faster than the old static build. So
now I am happy, I have gained performance rather than lost performance when rebuilding.
Thanks,
Mark
-----Original Message-----
From: Konrad, Martin <konrad at frib.msu.edu>
Sent: Monday, June 29, 2020 9:51 AM
To: Michael Davidsaver <mdavidsaver at gmail.com>; Mark Rivers <rivers at cars.uchicago.edu>; EPICS core-talk <core-talk at aps.anl.gov>
Subject: Re: EPICS shared library performance
Hi,
> * Spectre mitigations? (I would think not, unless redhat has
> backported to gcc 4.8)
[1] mentions GCC upgrades along with retpoline-based mitigation (these upgrades apparently came with RHEL 6.10).
> * Compare executable text size. (use 'size' not 'ls')
>
>> ... OS-independent thread pool ...
Good thoughts. Here's another idea:
* Excessive cache misses due to a slight increase in size of some heavily used data structure? You could try firing up the profiler to get some statistics:
$ perf stat -d ./application
Martin
[1] https://www.redhat.com/archives/rhelv6-announce/2018-June/msg00000.html
--
Martin Konrad
Facility for Rare Isotope Beams
Michigan State University
640 South Shaw Lane
East Lansing, MI 48824-1321, USA
Tel. 517-908-7253
Email: konrad at frib.msu.edu