Problem 1651

Summary: Possible problem in MT identified by CMS: race for G4VDecayChannel::G4MT_daughters
Product: Geant4 Reporter: Vladimir.Ivantchenko
Component: processes/decayAssignee: asai
Status: RESOLVED FIXED    
Severity: critical CC: asai
Priority: P5    
Version: 10.0   
Hardware: All   
OS: All   

Description Vladimir.Ivantchenko 2014-07-07 13:19:12 CEST
During deployment of Geant4 for CMS TBB Helgrind analysis point to possible thread unsafe situation. The full report:

==17243== Possible data race during read of size 8 at 0x2FFA59E8 by thread #3
==17243== Locks held: none
==17243==    at 0x28BAE87C: G4PhaseSpaceDecayChannel::DecayIt(double) (G4PhaseSpaceDecayChannel.cc:89)
==17243==    by 0x2924C8EC: G4Decay::DecayIt(G4Track const&, G4Step const&) (G4Decay.cc:252)
==17243==    by 0x2A0A4FDA: G4SteppingManager::InvokePSDIP(unsigned long) (G4SteppingManager2.cc:530)
==17243==    by 0x2A0A548E: G4SteppingManager::InvokePostStepDoItProcs() (G4SteppingManager2.cc:502)
==17243==    by 0x2A0A2765: G4SteppingManager::Stepping() (G4SteppingManager.cc:209)
==17243==    by 0x2A0AD17C: G4TrackingManager::ProcessOneTrack(G4Track*) (G4TrackingManager.cc:126)
==17243==    by 0x2861D064: G4EventManager::DoProcessing(G4Event*) (G4EventManager.cc:185)
==17243==    by 0x28462D9D: RunManagerMTWorker::produce(edm::Event const&, edm::EventSetup const&, RunManagerMT const&) (RunManagerMTWorker.cc:377)
==17243==    by 0x2840485B: OscarMTProducer::produce(edm::Event&, edm::EventSetup const&) (OscarMTProducer.cc:171)
==17243== 
==17243== This conflicts with a previous write of size 8 by thread #1
==17243== Locks held: none
==17243==    at 0x28BB451A: G4VDecayChannel::FillDaughters() (G4VDecayChannel.cc:346)
==17243==    by 0x28BAEAB5: G4PhaseSpaceDecayChannel::DecayIt(double) (G4PhaseSpaceDecayChannel.cc:89)
==17243==    by 0x2924C8EC: G4Decay::DecayIt(G4Track const&, G4Step const&) (G4Decay.cc:252)
==17243==    by 0x2A0A4FDA: G4SteppingManager::InvokePSDIP(unsigned long) (G4SteppingManager2.cc:530)
==17243==    by 0x2A0A548E: G4SteppingManager::InvokePostStepDoItProcs() (G4SteppingManager2.cc:502)
==17243==    by 0x2A0A2765: G4SteppingManager::Stepping() (G4SteppingManager.cc:209)
==17243==    by 0x2A0AD17C: G4TrackingManager::ProcessOneTrack(G4Track*) (G4TrackingManager.cc:126)
==17243==    by 0x2861D064: G4EventManager::DoProcessing(G4Event*) (G4EventManager.cc:185)
==17243== 
==17243== Address 0x2FFA59E8 is 72 bytes inside a block of size 96 alloc'd
==17243==    at 0x4806A85: operator new(unsigned long) (in /afs/cern.ch/cms/sw/ReleaseCandidates/vol0/slc6_amd64_gcc481/external/valgrind/3.9.0-cms3/lib/valgrind/vgpreload_helgrind-amd64-linux
==17243==    by 0x28B7EB41: G4PionZero::Definition() (G4PionZero.cc:87)
==17243==    by 0x28B7DBF2: G4MesonConstructor::ConstructLightMesons() (G4MesonConstructor.cc:91)
==17243==    by 0x28B7DCB8: G4MesonConstructor::ConstructParticle() (G4MesonConstructor.cc:82)
==17243==    by 0x28E41953: G4DecayPhysics::ConstructParticle() (G4DecayPhysics.cc:88)
==17243==    by 0x2A056E1C: G4VModularPhysicsList::ConstructParticle() (G4VModularPhysicsList.cc:115)
==17243==    by 0x2A047B09: G4RunManagerKernel::SetupPhysics() (G4RunManagerKernel.cc:457)
==17243==    by 0x2A047CA9: G4RunManagerKernel::SetPhysics(G4VUserPhysicsList*) (G4RunManagerKernel.cc:431)
==17243==    by 0x2845E5CC: RunManagerMT::initG4(DDCompactView const*, MagneticField const*, HepPDT::ParticleDataTable const*) (RunManagerMT.cc:144)
==17243==    by 0x2844D8D7: OscarMTMasterThread::OscarMTMasterThread(edm::ParameterSet const

More comments from CMS expert:

Code in G4PhaseSpaceDecayChannel::DecayIt(double) (G4PhaseSpaceDecayChannel.cc:89) 

89  if (G4MT_parent == 0) FillParent();  
90  if (G4MT_daughters == 0) FillDaughters();
Member declaration in G4VDecayChannel.hh (G4PhaseSpaceDecayChannel derives from G4VDecayChannel) 230    G4ParticleDefinition*  G4MT_parent;
231    G4ParticleDefinition** G4MT_daughters;
Code in G4VDecayChannel::FillDaughters() (G4VDecayChannel.cc:346) 346  G4MT_daughters = new G4ParticleDefinition*[numberOfDaughters];
I think the race is that thread 3 reads from G4MT_daughters in G4PhaseSpaceDecayChannel::DecayIt() and thread 1 writes to it in G4VDecayChannel::FillDaughters(), and there is no synchronization. I verified with gdb that indeed within OscarMTProducer at least some G4VDecayChannel-derived objects are shared between worker threads. 

There are some comments in G4VDecayChannel.hh, that I interpret such that these variables should be thread-local (and there is some discussion on the implementation), but at the moment they definitively are not.
Comment 1 asai 2014-07-07 23:58:32 CEST
The issue is confirmed and the fix is tagged and proposed.