I just switched to geant4-09-02-patch-01 (and CLHEP 2.0.4.2). When my main() calls ::exit(0), I get 8 instances of this error message: g4beamline(54299) malloc: *** error for object 0x90faa50: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug This did not happen with geant4-09-02 (and CLHEP 2.0.2.3). If I change ::exit(0) to ::_exit(0) the messages disappear. My program does not call atexit() directly, but by temporarily putting two different calls to it, at the start of main and just before calling ::exit(0) at the end of main, I can see that the above error messages come between them. So the problem is clearly in functions registered with atexit. Note my code does not delete the run manager when closing up (reasons are complex), it just opens the geometry to avoid warnings about deleting stores with the geometry open. Omitting that causes the above warnings to come before the geometry closed warnings. Un-commenting the delete runManager just gives a segmentation violation, followed by the above 8 error messages, the geometry closed warnings, and my first atexit callback. This is Mac OS X 10.5.6. I have not tried other OSs (yet).
The problem reported doesn't seem related to Geant4. There're no changes in 9.2.p01 versus 9.2 which could justify such behavior (which, btw, we cannot reproduce). Moving to a new version of the underlying libraries may have put visible some already existing problems in the memory management (improper deletion or manipulation of the geometry objects in the user application identified by the call to the run-manager deletion, is a sign of this in my opinion). I can only suggest to use the debugger for investigate further the problem, compiling the user application (and eventually the Geant4 libraries as well) in debug-mode for more detailed information.
Created attachment 43 [details] Stack traces of malloc failures inside exit(0). This came from a ddd session debugging g4beamline 1.16 linked with a debug version of geant4.9.2.p01.
The problem is most definitely inside Geant4. It is related to the way objects are registered in G4CrossSectionDataSetRegistry. The malloc errors come inside G4CrossSectionDataSetRegistry::Clean() as it deletes them. The code clearly registered mis-aligned pointers. None of my code is involved -- my program has no code related to any hadronic processes. All it did was to instantiate an instance of the QGSP_BIC physics list. I ran with the following malloc debugging environment: export MallocLogFile=$PWD/malloc.log export MallocPreScribble=1 export MallocScribble=1 export MallocCheckHeapStart=100 export MallocCheckHeapEach=100 export MallocCheckHeapAbort=1 export MallocBadFreeAbort=1 The only errors in malloc.log are the same 8 non-aligned pointers being freed (there are >40,000 test passed messages). This would have found most wild-pointer writes in my code (or in Geant4 code). As a test, I put debug printf()-s into G4CrossSectionDataSetRegistry: Clean(), Register(), and DeRegister(). The 8 non-aligned pointers were registered. Combined with the above test, I don't think this is any sort of heap corruption. Beyond this it would be terribly inefficient for me to attempt to debug this further -- there are too many types of Register() and I have no notion of the code structure. This sure looks like a general problem -- malloc never gives non-aligned pointers, so some code somewhere is calling G4CrossSectionDataSetRegistry::Register() with a pointer that did not come from malloc() -- that guarantees a problem in Clean(). Perhaps other OSs (or their C++ compilers) don't detect this, so debugging on MAC OS X might be most efficient. But an expert could probably find all calls to that function and find the ones that were not from malloc. This is probably subtle, so here's a guess: Register() is called with a G4VCrossSectionDataSet*. If the actual argument inherits from multiple classes, then this problem could occur as the compiler casts the object to that type. Then, even though the object was allocated via new, the pointer passed to Register() could be non-aligned. This is likely to be compiler-specific. When I first opened this bug report, I did not know to which part of Geant4 it applied, so I used "global". Perhaps it should be changed to processes/hadronic/cross-sections.
Thanks for the detailed information. I believe we now have all the elements to investigate the problem further, as it seems to be due to the recently introduced de-registration mechanism for the hadronic processes. The problem is being assigned to the responsible.
Thanks to the bug report the problem was fixed and the fix will be included in the next Geant4 release at June and in a new patch for 9.2 if such patch will be created.