Problem 1179

Summary: Crash in G4QFragmentation::Breeder, double delete?
Product: Geant4 Reporter: Andrea Dotti <andrea.dotti>
Component: processes/hadronic/models/chiral_inv_phase_spaceAssignee: Andrea Dotti <andrea.dotti>
Status: RESOLVED FIXED    
Severity: major CC: dennis.herbert.wright, John.Apostolakis, Koichi.Murakami
Priority: P5    
Version: 9.4   
Hardware: PC   
OS: Linux   

Description Andrea Dotti 2011-03-08 17:02:39 CET
Hello,
I have obtained the following crash with geant4-09-04-ref-02 with the SimplifiedCalorimeter application.

The simulation was anti_proton on Fe/Sci calorimeter.

A crash probably due to a "double" delete or memory corruption:
*** glibc detected *** /pool/lsf/adotti/124838148/bin/Linux-g++/mainStatAccepTest: free(): corrupted unsorted chunks: 0x000000000dbcccf0 ***

has occurred in the G4QFragmentation::Breeder function.
The interaction was as primary: anti_neutron at 7.4GeV on Iron nucleus.

There is some output from the application itself before the crash:

G4QFragmentation::Breeder: H#0, d4M=(-601.268,374.429,-1407.91;-1266.84),dCh=0,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#1, d4M=(550.987,-558.314,-1529.87;-1318.34),dCh=-2,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#2, d4M=(239.245,-388.813,-1171.18;-1210.66),dCh=0,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#3, d4M=(-210.162,184.094,-1201.35;-1282.06),dCh=-1,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#4, d4M=(-13.269,287.414,-1111.34;-1185.19),dCh=0,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#5, d4M=(5.06707,-192.85,-1809.52;-1895.48),dCh=-2,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#6, d4M=(184.72,-0.020304,-1879.06;-1931.59),dCh=-1,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#7, d4M=(-277.569,-35.3974,368.834;307.088),dCh=-1,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#8, d4M=(-308.305,-96.8165,-1763.71;-1798.14),dCh=0,dBN=0
*Warning*G4QFragmentation::Breeder: Nonconservation isn't cured!
G4QFragmentation::Breeder: H#9, d4M=(284.495,48.6549,-487.232;-583.249),dCh=-1,dBN=0
G4QFragmentation::Breeder:***Cured*** Redundent 2Hadrons i=9
*** glibc detected *** /pool/lsf/adotti/124838148/bin/Linux-g++/mainStatAccepTest: free(): corrupted unsorted chunks: 0x000000000dbcccf0 ***

I have random seed to reproduce the event.
From GDB session I can see that the crash is due to the G4QHadron destructor used at line:
#16 0x00002aaab679cbab in G4QFragmentation::Breeder (this=0x7fffffff92e0) at src/G4QFragmentation.cc:3735

I add here few lines around line 3735:

3717            if( !(hCh+mCh+curStrChg) && !(hBN+mBN+curStrBaN) && std::fabs(dEn+hEn+mEn)<eps &&
3718                std::fabs(dPx+hPx+mPx)<eps && std::fabs(dPy+hPy+mPy)<eps &&
3719                std::fabs(dPz+hPz+mPz)<eps )
3720            { 
3721              G4cout<<"G4QFragmentation::Breeder:***Cured*** Redundent 2Hadrons i="<<i<<G4endl;
3722              G4QHadron* theLast = (*theResult)[nHadr-1];
3723              curHadr->Set4Momentum(theLast->Get4Momentum()); //4-Mom of CurHadr
3724              G4QPDGCode lQP=theLast->GetQPDG();
3725              if(lQP.GetPDGCode()!=10) curHadr->SetQPDG(lQP);
3726              else curHadr->SetQC(theLast->GetQC());
3727              theResult->pop_back(); // theLastQHadron is excluded from OUTPUT
3728              delete theLast;        //*!!When kill, delete theLastQHadr as an Instance!*
3729              theLast = (*theResult)[nHadr-2];
3730              curHadr->Set4Momentum(theLast->Get4Momentum()); //4-Mom of CurHadr
3731              lQP=theLast->GetQPDG();
3732              if(lQP.GetPDGCode()!=10) curHadr->SetQPDG(lQP);
3733              else curHadr->SetQC(theLast->GetQC());
---Type <return> to continue, or q <return> to quit---
3734              theResult->pop_back(); // theLastQHadron is excluded from OUTPUT
3735              delete theLast;        //*!!When kill, delete theLastQHadr as an Instance!*                   
3736              break;
3737            }
(gdb) 

I do not understand the logic but I suspect the problem can come from a memory corruption due to the two "delete" (line 3728 and 3735) plus the two "pop_back()" (lines 3727 and 3734) and the direct access to what seems vector elements at lines 3722 and 3735.
Comment 1 Mikhail.Kossov 2011-05-02 11:03:00 CEST
The case is closed by the hadr-chips-proc-V09-04-03 tag. M. Kosov.
Comment 2 Andrea Dotti 2011-05-12 16:45:43 CEST
The tag hadr-chips-proc-V09-04-03 is actually not appropriate for this bug and it does not solve the problem. However Mikhail provided the following code to replace the "if" block starting at line 3717 and it has been tested and the event passed without crash.
I will get in contact with Gabriele to understand how to branch and create a patch for 9.4.p01 release. The same code should go in the TRUNK (I asked Mikhail for this).

Dennis can you please assign this to me?
Thanks
Andrea


       if( !(hCh+mCh+curStrChg) && !(hBN+mBN+curStrBaN) && std::fabs(dEn+hEn+mEn)<eps &&
           std::fabs(dPx+hPx+mPx)<eps && std::fabs(dPy+hPy+mPy)<eps &&
           std::fabs(dPz+hPz+mPz)<eps && i>0)
       {
         G4cout<<"G4QFragmentation::Breeder:***Cured*** Redundent 2Hadrons i="<<i<<G4endl;
         G4QHadron* preHadr = (*theResult)[i-1];
         G4QHadron* theLast = (*theResult)[nHadr-1];
         if(i < nHadr-1)        // Only cur can overlap with the two last hadrons
         {                      // Put the last to the previous
           preHadr->Set4Momentum(theLast->Get4Momentum()); // must be 4-Mom of preHadr
           G4QPDGCode lQP=theLast->GetQPDG();
           if(lQP.GetPDGCode()!=10) preHadr->SetQPDG(lQP);
           else preHadr->SetQC(theLast->GetQC());
         }
         theResult->pop_back(); // theLastQHadron's excluded from OUTPUT(even if Cur=Last)
         delete theLast;        //*!!When kill, delete theLastQHadr as an Instance!*
         theLast = (*theResult)[nHadr-2]; // nHadr is not changed -> so it's LastButOne
         if(i < nHadr-2)        // The two current and the two Last are not overlaped
         {                      // Put the last but one to the current
           curHadr->Set4Momentum(theLast->Get4Momentum()); // must be 4-Mom of curHadr
           G4QPDGCode lQP=theLast->GetQPDG();
           if(lQP.GetPDGCode()!=10) curHadr->SetQPDG(lQP);
           else curHadr->SetQC(theLast->GetQC());
         }
         theResult->pop_back(); // theLastQHadron's excluded from OUTPUT(even for overlap)
         delete theLast;        //*!!When kill, delete theLastQHadr as an Instance!*
         nHadr=theResult->size(); // Just a precaution... should be nHadr-2
         break;
       }
Comment 3 Andrea Dotti 2011-06-06 11:59:33 CEST
Fix has been provided by author. The code for geant4-09-04-patch-01 has been fixed by tag:
hadr-chips-frag-V09-03-10.

Tests on the crashing event have been performed and the code fixes the issue.