Problem 1780

Summary: G4UniformRand() in G4SPSRandomGenerator are not random enough!
Product: Geant4 Reporter: tammen <tammen>
Component: runAssignee: Andrea Dotti <andrea.dotti>
Status: RESOLVED WONTFIX    
Severity: critical CC: cterasa
Priority: P5    
Version: 10.1   
Hardware: All   
OS: All   
Attachments: project to demonstrate a severe bug in G4's PRNG
project which shows problems with the G4 MT PRNG

Description tammen 2015-09-10 10:56:40 CEST
I created a sample world box in which I put a simple GPS source with the following parameters:

/gps/particle		geantino
/gps/pos/type		Surface
/gps/pos/shape		Para
/gps/pos/halfx		10 cm
/gps/pos/halfy		10 cm
/gps/pos/halfz		10 cm
/gps/pos/centre		0 0 0 cm
/gps/ang/type		cos
/gps/ene/type		Mono
/gps/energy		100 MeV

Inside this world volume there is a sphere with 1cm radius as Sensitive detector.

With the geant4 versions (10.01.p01 and 10.01.p02) there is a non-negliable chance that at least some of the sampled combinations of starting positions and directions are equal.

Example output (Event ID, Primary Vertex X, PV Y PV Z, particle type):

21024 9.883151351 1.27283688 -0.8385735787 geantino
637587 9.883151351 1.27283688 -0.8385735787 geantino

In my test case I get 5162 hits inside the detector and only 5011 unique starting positions so 150 positions exist more than once!

When one now changes random number generation inside the G4SPSRandomGenerator.cc by simply adding

#define G4UniformRand() (double)rand()/RAND_MAX

at the beginning of the source file, no double positions are sampled anymore, there are now 5194 hits of which all are unique!

This might indicate to a problem in random number generation by G4UniformRand(), which is used in may other files as well.
Comment 1 tammen 2015-09-11 12:15:43 CEST
When compiling geant4 without multithreading support, the problem does not exist. So it seems as if there is a problem with random number generation and multithreading.
Comment 2 Christoph Terasa 2015-09-14 17:28:27 CEST
I ran a simple simulations with geantinos in Geant4 10.1p2, with the gps settings outlined above. I logged generated Phi and Theta start locations, and the generated Phi and Theta primary vertex directions. In the multithreading case (even only using 1 thread), I got the following result:

sort -n geantino.txt | uniq -c | sort -n | tail
      7 -3.02322 2.09462 0.29373 1.88727
      7 -3.0361 1.81133 1.35956 1.26641
      7 -3.13003 1.99172 0.139298 0.257301
      8 -0.659066 1.89134 2.56548 1.16313
      8 -0.768951 0.964422 3.12355 2.07705
      8 0.896041 2.02675 -3.11808 1.55114
      8 -1.4715 1.3215 2.28708 0.869237
      8 -1.84904 1.54519 0.635515 0.857693
      8 2.09585 1.47702 -0.904532 1.32556
      9 0.729879 2.87103 -0.997289 1.42946

which means that the *exact* tuple ( 0.729879 2.87103 -0.997289 1.42946 ) occurred *9* times in the data. It seems there is something horribly broken with the periodicity of the PRNG.

I ran the same setup with a non-multithreaded installation of Geant4 10.1p2:

sort -n geantino_st.txt | uniq -c | sort -n | tail
      1 -9.81208e-05 1.84146 -2.17908 1.84887
      1 9.84953e-05 1.98273 -2.41123 0.390816
      1 9.88699e-05 0.753454 -1.44189 2.64596
      1 -9.88699e-05 1.9278 -2.47878 0.799797
      1 -9.92444e-05 0.378536 2.62706 2.24051
      1 9.92444e-05 2.09265 -2.77763 1.09248
      1 -9.92444e-05 2.16882 2.99306 1.51412
      1 -9.96189e-05 0.600437 -0.0983613 2.93393
      1 9.96189e-05 1.36151 -3.05519 2.15339
      1 -9.96189e-05 1.80702 1.72592 1.14509

which, expectedly, does not have a single tuple occurring more than once.

There still might be a bug in our user code, I will try to boil it down to the very basics to see that.
Comment 3 Christoph Terasa 2015-09-15 08:43:16 CEST
Created attachment 357 [details]
project to demonstrate a severe bug in G4's PRNG
Comment 4 Christoph Terasa 2015-09-15 08:43:44 CEST
Comment on attachment 357 [details]
project to demonstrate a severe bug in G4's PRNG

I created a minimal example which should be publicly accessible at https://gitlab.physik.uni-kiel.de/geant4/prng

I also attached the same project to this comment.


== BUILDING ==
Build the project as usual, by creating a build subdirectory, and running

cd build
cmake .. && make -j

In default mode the project will be built with multithreading support. To disable multithreading, run

cmake -DCMAKE_CXX_FLAGS=-DNOMULTITHREADING .. && make -j

== USAGE ==
The example just prints the primary vertex positions phi and theta, and the primary vertex directions phi and theta to stdout as a 4-tuple, per event. The idea is to pass the output of prng to regex.sh, and pipe the result into a file:

./prng ../run/run.mac | ../regex.sh > prng.txt

After the run the result can be analyzed for multiple occurrences of the same 4-tuple by invoking

../analyze.sh prng.txt


Example results are:

Multitreading:
      3 -3.13716 1.3211 0.700555 1.12624 
      4 -0.281198 2.60013 -2.52332 0.860336 
      4 0.457291 2.85897 -0.13825 1.15541 
      4 -1.28582 2.29857 1.72768 1.15125 
      4 -1.58518 1.80448 0.49676 1.33416 
      4 -1.63299 0.787328 1.00209 2.00438 
      4 -1.71651 1.53308 0.876754 2.70904 
      4 1.88612 0.0720565 -3.07217 1.88981 
      4 2.08797 0.715426 -2.07369 2.19369 
      4 -2.76575 0.653177 0.431936 1.88553 

Singletreading:
      1 -7.11563e-05 1.05414 -2.74011 1.56744 
      1 7.11563e-05 1.19471 2.31573 1.38747 
      1 -7.49014e-06 0.809768 -2.21183 1.94703 
      1 7.86465e-06 1.0074 -3.04934 2.31239 
      1 -9.02562e-05 0.716975 2.56991 1.66531 
      1 -9.02562e-05 1.73661 -2.0252 1.37 
      1 9.17542e-05 2.01268 -1.98433 1.7958 
      1 -9.36268e-06 1.97845 2.60895 1.7251 
      1 9.58738e-05 1.49225 -2.65591 0.863971 
      1 9.58738e-05 1.96131 3.00116 1.1963 


It should be pretty evident now that something is severely broken with the PRNG in multithreading mode.
Comment 5 Christoph Terasa 2015-09-15 09:58:58 CEST
Created attachment 358 [details]
project which shows problems with the G4 MT PRNG

Uploaded a new example project which shows the problem more explicitly.
Comment 6 Christoph Terasa 2015-09-16 10:02:47 CEST
I solved the problem of multiple occurrences of the same tuple by adding the following to my main program:

git diff HEAD~1
diff --git a/prng.cc b/prng.cc
index af53fef..30b41f9 100644
--- a/prng.cc
+++ b/prng.cc
@@ -22,6 +22,7 @@ int main(int argc,char** argv)
        }
 
 #ifndef NOMULTITHREADING
+       G4MTRunManager::SetSeedOncePerCommunication(1);
        G4MTRunManager* runManager = new G4MTRunManager;
        runManager->SetNumberOfThreads(1);
 #else

From my understanding of G4MTRunManager this changes the reseeding behavior from reseeding the PRNG after every event, to (re)seeding only once per initialized worker thread. Is there a specific reason why the G4MTRunManager has this reseeding capability, but the singlethreaded G4RunManager does not have that? What's the rationale behind the default setting of reseeding the PRNG after every event?

My conclusion is that reseeding the PRNG after every event limits the available number space that the PRNG can reach. This means sometimes the PRNG of an event starts at the exact same number as the PRNG of an previous event, which leads to the exact same random numbers generated as in the previous event.

I'd argue that the default reseeding behavior of the G4MTRunManager is a severe bug and should be fixed, but I expect and hope there is a rationale behind the current behavior.
Comment 7 Christoph Terasa 2015-09-16 12:28:34 CEST
I found the rationale in the source code, ./source/run/src/G4RunMessenger.cc, lines 136ff

"If seedOnce is set to 0 (default), seeds that are centrally managed by G4MTRunManager are set for every event of every worker thread. This option guarantees event reproducability regardless of number of threads.

If seedOnce is set to 1, seeds are set only once for the first event of each run of each worker thread. Event reproducability is guaranteed only if the same number of worker threads are used. On the other hand, this option offers better computing performance in particular for applications with relatively small primary particle energy and large number of events.

If seedOnce is set to 2, seeds are set only for the first event of group of N events. This option is reserved for the future use when Geant4 allows number of threads to be dynatically[sic] changed during an event loop." 

I can now understand why the defaults were chosen like that, but it's pretty unfortunate that this leads to vastly different results compared to singlethreaded mode. But still, I think this not really encompasses a "bug", but merely a decision, so I'd say this issue can be closed.


I only urge to make this issue a pretty big caveat in the multithreading migration guide at https://twiki.cern.ch/twiki/bin/view/Geant4/QuickMigrationGuideForGeant4V10 and other relevant places.
Comment 8 Andrea Dotti 2015-09-16 18:03:43 CEST
Hello,
thank you for reporting this.
I need to go through your comments, because they are potentially indicating a problem that our testing did not catch.

Andrea
Comment 9 Andrea Dotti 2016-09-16 09:41:23 CEST
Hello,
I am sorry for the very low activity on this issue.

I've double checked the logic of our seeding and tested with our tests. 
The logic is correct.

The default behavior in G4 is correct and I cannot reproduce reproducibility issues with G4 testing setups.

Andrea