Using the TOPAS user code a radially binned sphere was simulated (e.g. nbins=100) with one primary particle using livermore low energy physics. The material for the world and sphere were both G4_WATER. With this setup I identified two issues on Linux: 1. After initialization but before the first history was generated the simulation was using only the master thread at 100% for about 20-30 s before the histories are being generated. 2. The execution time for the same number of histories was exponentially higher on Linux than on Mac (something along the lines of 200s vs 2s for the same # histories, same # threads) I replicated the issue on Linux using the following configurations (initially using the public packaged user code and then building it myself from scratch, with same results): Using both, GCC AND CLANG: - AMD CPU Workstation with (ClearLinux, Ubuntu 20.10) - Intel i5 Surface Pro with Windows 10, WSL2 (Ubuntu 20.04) - Intel i5 Mac Book Pro with VirtualBox (Debian 10, Fedora 33) The macOS-based tests were carried out on - Intel i7 Mac Mini (OSX 10.15) - Intel i5 Mac Book Pro (OSX 10.15) I compiled Geant4.10.04.p03 with the same options on both Linux and Mac and have verified that both Geant4 and the user code were actually built with RelWithDebInfo on both platforms. Based on my differential testing above I'm attributing this to something on the OS-level. Debugging with valgrind/callgrind on Linux showed about 60E+9 calls for the simulation with one history with the majority coming from a call stack starting somewhere around > G4RunManagerKernel::RunInitialization(bool) and continuing through the Geometry Manager into G4Sphere > G4Sphere::InitializeThetaTrigonometry() which then calls ./math/../sysdeps/ieee754/dbl-64/s_tan.c:__tan_fma In the valgrind/callgrind output it presents as follows: 50,136,865,520 (88.33%) < /opt/geant4.10.06-RelWithDebInfo/include/Geant4/G4Sphere.icc:G4Sphere::InitializeThetaTrigonometry() (191,100x) 50,137,915,844 (88.34%) * ???:0x00000000004ff4c0 [???] 50,137,533,636 (88.33%) > ./math/../sysdeps/ieee754/dbl-64/s_tan.c:__tan_fma (191,104x) [/usr/lib/x86_64-linux-gnu/libm-2.32.so] The program (with 1 history to be simulated) spends about 88% of its time calling std::tan from the GNU standard library. The source code in `G4Sphere.icc:G4Sphere::InitializeThetaTrigonometry()` calculates the sine, cosine, and tangents of the sphere's angles each by calling the corresponding stdlib function. Replacing the call to `std::tan` with the ratio of the previously calculated `std::sin` and `std::cos` and recompiling G4geometry fixed the issue. By comparison, profiling the user code with Xcode on macOS also showed calls to the math standard library for __sincos, but not for tangent and also not with the number as recorded above. My suspicion is that the Mac standard library expands std::tan inline into std::sin/std::cos, which the compiler then replaces with the ratio of already calculated sine and cosine memory values. To make sure that this doesn't lead to some funny code when dividing by zero (at e.g. theta=pi/2) I checked what the result is when calculating `std::sin(M_PI/2)/std::cos(M_PI/2) = 0 +/- EPSILON_DBL`. This may have not just fixed the first issue, but also the second one where the histories came out very slowly on Linux, due to repeated calls to `ComputeDimensions` for each history.
Thanks for the detailed analysis of the problem and the suggested fix. The fix will be included in the coming release Geant4 10.07