Problem 2424

Summary: G4Exception : Cache001, when running only a few primaries with multiple sources in GPS
Product: Examples/Advanced Reporter: Steffen <Steffen.Mueller>
Component: gammaknifeAssignee: lns.infn.it <romanof>
Status: ASSIGNED ---    
Severity: minor CC: Steffen.Mueller, susanna
Priority: P4    
Version: 10.7   
Hardware: PC   
OS: Linux   

Description Steffen 2021-09-16 16:04:04 CEST
Disclaimer: I am using the gammaknife example only as it uses the General Particle Source. I encountered the problem the first time in my own application, which also uses the General Particle Source. So the problem is not related with the example, but rather with Geant4-Toolkit.

Problem: When using more than 2 sources in GPS and running only one primary in a run, the application sometimes throws an exception at the very end of execution. 

Steps to reproduce (tested with 4-10-07-patch-02 AND 4-11.00-beta-01):
1: Compile /examples/advanced/gammaknife
2: Edit defaultMacro.mac "/run/beamOn 100" to "/run/beamOn 1" (problem only occurs when simulating few primaries per run).
3: Edit GPS.in so that there are in total 3 sources added. Simply copy the last source definition (lines 18-30) and append it to the end of the file. (Problem does not occur when only 2 sources are added!)
4: run ./gammaknife defaultMacro.mac several times. 

Output in case problem occurs: 
...
...
### Run 0 starts.
Run 0 starts ...
 Run terminated.
Run Summary
  Number of events processed : 1
  User=0.000000s Real=0.000287s Sys=0.000000s
          User TOT = 0    Real TOT = 0.0003795
 Summary of Run 0 :
0 events have been kept for refreshing and/or reviewing.
  "/vis/reviewKeptEvents" to review them one by one.
  "/vis/enable", then "/vis/viewer/flush" or "/vis/viewer/rebuild" to see them accumulated.
/score/dumpQuantityToFile boxMesh_1 eDep eDep_scorer.out
Graphics systems deleted.
Visualization Manager deleting...
G4 kernel has come to Quit state.
================== Deleting memory pools ===================
Number of memory pools allocated: 11; of which, static: 0
Dynamic pools deleted: 11 / Total memory freed: 0.018 MB
============================================================
RunManagerKernel is deleted. Good bye :)

-------- EEEE ------- G4Exception-START -------- EEEE -------

*** ExceptionHandler is not defined ***
*** G4Exception : Cache001
      issued by : G4CacheReference<V>::Destroy
Internal fatal error. Invalid G4Cache size (requested id: 2 but cache has size: 1 Possibly client created G4Cache object in a thread and tried to delete it from another thread!
*** Fatal Exception ***
-------- EEEE ------- G4Exception-END -------- EEEE -------


*** G4Exception: Aborting execution ***

### CAUGHT SIGNAL: 6 ### address: 0x3e800007c1b,  signal =  SIGABRT, value =    6, description = abort program (formerly SIGIOT). 

Backtrace:
[PID=31771, TID=-2][ 0/14]> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141) [0x7fad501adce1]
[PID=31771, TID=-2][ 1/14]> /lib/x86_64-linux-gnu/libc.so.6(abort+0x123) [0x7fad50197537]
[PID=31771, TID=-2][ 2/14]> /opt/geant4/lib/libG4global.so(_Z11G4ExceptionPKcS0_19G4ExceptionSeverityS0_+0x10cc) [0x7fad5063772c]
[PID=31771, TID=-2][ 3/14]> /opt/geant4/lib/libG4global.so(_Z11G4ExceptionPKcS0_19G4ExceptionSeverityRNSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEEE+0xe0) [0x7fad506378c0]
[PID=31771, TID=-2][ 4/14]> /opt/geant4/lib/libG4event.so(+0x55e69) [0x7fad52881e69]
[PID=31771, TID=-2][ 5/14]> /opt/geant4/lib/libG4event.so(_ZN7G4CacheIN20G4SPSPosDistribution13thread_data_tEED1Ev+0x5a) [0x7fad5288216a]
[PID=31771, TID=-2][ 6/14]> /opt/geant4/lib/libG4event.so(_ZN20G4SPSPosDistributionD1Ev+0x10) [0x7fad5287c3c0]
[PID=31771, TID=-2][ 7/14]> /opt/geant4/lib/libG4event.so(_ZN22G4SingleParticleSourceD1Ev+0x47) [0x7fad528896a7]
[PID=31771, TID=-2][ 8/14]> /opt/geant4/lib/libG4event.so(_ZN22G4SingleParticleSourceD0Ev+0x9) [0x7fad52889719]
[PID=31771, TID=-2][ 9/14]> /opt/geant4/lib/libG4event.so(_ZN27G4GeneralParticleSourceDataD1Ev+0x2a) [0x7fad5285119a]
[PID=31771, TID=-2][10/14]> /lib/x86_64-linux-gnu/libc.so.6(+0x3e4d7) [0x7fad501b04d7]
[PID=31771, TID=-2][11/14]> /lib/x86_64-linux-gnu/libc.so.6(+0x3e67a) [0x7fad501b067a]
[PID=31771, TID=-2][12/14]> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7fad50198d11]
[PID=31771, TID=-2][13/14]> ./gammaknife(+0xbb2a) [0x561fd0e33b2a]

: Aborted (Signal sent by tkill() 31771 1000)
Comment 1 Gabriele Cosmo 2021-10-19 12:28:32 CEST
This is likely due to the order of deletion in the main() program of "Gammaknife". The RunManager must be the final object to be deleted.
Comment 2 Steffen 2021-10-19 14:55:05 CEST
Thank you for looking into this. I might have some additional information regarding your comment: 

I am not sure about the case in the gammaknife example, however, in my application (where the same error is raised) the RunManager is the last object to be deleted in my main(). So maybe this is not yet the solution.
Comment 3 Susanna Guatelli 2021-10-26 05:17:19 CEST
I tried to reproduce the problem (added a 3rd GPS source) and run 1 history, and the sim runs without any problem. I then increased the number of events and I did not find any problem. I ran the example with Geant4 10.7 on x86_64-redhat-linux, gcc version 8.3.1. 

Susanna
Comment 4 Steffen 2021-10-27 07:55:39 CEST
Dear Susanna,

did you run the simulation multiple times with one primary? Unfortunately the bug is not 100% reproducible. With my setup the bug is triggered approx. 50% of all runs with one primary.
Another idea: I ran the simulation with a 6/12core machine with Debian Linux, I will also try to reproduce the bug on another machine with a different CPU setup, maybe this has something to do with number of cores as well...

Thanks, 
 Steffen
Comment 5 Susanna Guatelli 2021-10-28 00:55:29 CEST
Dear Steffen

I ran 3 GPS sources, with 
/run/beamOn 1 : repeated one, two, three, four times. I used the default macro with no visualisation.

As recommended by Gabriele, I am now using valgrind in DRD mode to see if I get any error.  

Cheers
Susanna
Comment 6 Steffen 2021-10-28 07:37:01 CEST
Just as additional info the build-options for my Geant installation: 

-DGEANT4_BUILD_MULTITHREADED=OFF
-DGEANT4_INSTALL_DATA=ON
-DGEANT4_INSTALL_DATA_TIMEOUT=6000
-DGEANT4_USE_QT=ON
-DGEANT4_USE_OPENGL_X11=ON
-DGEANT4_USE_RAYTRACER_X11=ON"

Since the error indicates a relation to threads, the MT option might be important. And I can confirm that the error is also raised -for me- without visualisation. 

Thanks,
 Steffen
Comment 7 Susanna Guatelli 2021-10-29 02:21:55 CEST
Dear Steffen

I compiled Geant4 with MT ON. I will try again the test with MT off. I will let you know how it goes.
cheers
Suasnna