Problem 1286

Summary: Geant4 9.5 seems to have memory leak issues
Product: Geant4 Reporter: Marco Pinto <pinto>
Component: processes/hadronic/processesAssignee: dennis.herbert.wright
Status: RESOLVED FIXED    
Severity: critical CC: Alberto.Ribon, asai, Gabriele.Cosmo, hedge-hog, jasondet, Thomas.Papaevangelou, Vladimir.Ivantchenko
Priority: P1    
Version: 9.5   
Hardware: PC   
OS: Linux   
Attachments: C++ source file: G4WHadronElasticProcess.cc
Application where the memory leak appears
Application with memory leak
Updated application with README file
Valgrind report

Description Marco Pinto 2012-02-02 15:04:24 CET
I am running an application that with geant4 9.4 p02 was perfectly good but using it with Geant4 9.5 seems to show memory leaks. 

This problem is very clear when using the neutron HP classes (elastic, inelastic, capture and fission) but it seems that the problem is even there when not using it at all. When using neutron HP classes with geant4 9.5 in less than 4 hours my application crashes because it is consuming around 4 GB of memory. When I am not using them this is not so obvious but my application is steadily using more memory during runtime (although it usually does not reach the 4 GB limit to crash). The application that crashes after about 4 hours is the same that using Geant4 9.3 p02 uses only around 5% (200 MB) of memory during runtime.

I have a code with my own PhysicsList, I reviewed it and I used valgrind and nothing was found in it. Since I could not guarantee that this is a Geant4 9.5 problem, I ran my code without my PhysicsList and I called QGSP_BIC and QGSP_BIC_HP packages and the conclusions were the same. 

Since I could still not guarantee that the problem could be caused by Geant4 9.5, I ran the novice/N02 example (for no particular reason, I simply chose an example at random) with QGSP_BIC and QGSP_BIC_HP and the conclusions were the same: when running this example using QGSP_BIC_HP I started with around 5% memory used and I kill it after 118 minutes with a 35.8% of memory consumed by it. With the QGSP_BIC I started with 2.2% and kill it after 60 minutes and it was using 4.2% of memory. 

NOTE: I assigned the QGSP_BIC and the QGSP_BIC_HP packages to my code and to the example02 by simply commenting "runManager->SetUserInitialization(physics);" and calling "runManager->SetUserInitialization(new QGSP_BIC);" or "new QGSP_BIC_HP".
Comment 1 Gabriele Cosmo 2012-02-07 10:04:21 CET
When using the HP physics-lists you're going to load data from the neutron data set which in Geant4 9.5 is now much bigger as it includes many more isotopes being loaded in memory.
This may certainly explain the increase you see at initialization and along the simulation.
To verify if there're indeed leaks during the event loop, you should compare the Valgrind logs of two runs of the same application in 9.5 where you simulate a different number of events and compare the logs, summing the leaked bytes of memory reported at the end and see if there is any difference. If the sums are the same in both logs it means there is no leak in the event loop, and what reported is only memory allocated at initialization which is not being freed by the application at the end...
Comment 2 jasondet 2012-02-10 00:43:13 CET
(In reply to comment #1)
> When using the HP physics-lists you're going to load data from the neutron data
> set which in Geant4 9.5 is now much bigger as it includes many more isotopes
> being loaded in memory.
> This may certainly explain the increase you see at initialization and along the
> simulation.
> To verify if there're indeed leaks during the event loop, you should compare
> the Valgrind logs of two runs of the same application in 9.5 where you simulate
> a different number of events and compare the logs, summing the leaked bytes of
> memory reported at the end and see if there is any difference. If the sums are
> the same in both logs it means there is no leak in the event loop, and what
> reported is only memory allocated at initialization which is not being freed by
> the application at the end...

My collaborators and I are also seeing this problem. For example, go to example novice/N01, swap out the physics list for QGSP_BERT_HP or Shielding, and have it fire 10 MeV neutrons, and just watch the memory usage in top or your favorite profiler. The more events you fire, the more memory gets used. Somehow I couldn't get valgrind to run on my system without hanging, but if you let it run long enough you will run out of memory. G4NDL4.0 is only 1.4 GB, so it shouldn't be possible to use up many GB of memory just loading isotope data. Please fix this! We cannot run high-stats runs in G4.9.5 in its current state.
Comment 3 Marco Pinto 2012-02-10 10:29:54 CET
Thank you jasondet for your contribution to this report.

I have run valgrind as Gabriele Cosmo said and valgrind seems not to find any memory leak (using example novice N02 and QGSP_BIC_HP with 100 and 10000 events). 

However, the fact that valgrind does not find anything does not mean that there is no leak. Valgrind compares the amount of memory used during runtime with the memory not freed after runtime. Although this procedure can find the vast majority of memory leaks, if Geant4 can indeed free the memory when the destructors are called, valgrind will not find the memory leak. 

I am only asking to someone to run a simpler example with neutron HP libraries using a high number of events and to see (or not) the memory being consumed with, for example, the command top in linux. 

When we have 4 GB of memory consumed after ~6 hours of runtime using one of the simpler examples, well, if this is not a memory leak what should it be?

Best regards, 
Marco Pinto
Comment 4 Thomas Papaevangelou 2012-02-16 15:06:29 CET
I can verify the problem, (although I don't know what happens with versions previous to 9.5). 
I have a very simple application with low energy neutrons (<20 MeV) hitting a polyethylene target. Every ~1000 events the memory use of the application incereases by ~5 MB. I have verified that there is not a problem in the rest of my application and the leak appears ONLY when using the predifiened QSGC_BIC_HP and QSGS_BERT_HP physics lists. I have also tried a physics list without HP neutrons (QSGC_BIC) and the problem appears again (!) but for much more events (every ~15000). Maybe it has to do with the number of secondaries produced?
Comment 5 Gabriele Cosmo 2012-02-19 11:01:57 CET
We identified what look likes to be the source of the considerable memory increase reported when using HP physics-lists. The problem is affecting elastic scattering in class G4WHadronElasticProcess (in source/hadronic/processes/src/G4WHadronElasticProcess.cc), where the following incorrect line of code:
            182   hadi->ApplyYourself(thePro, *targetNucleus);
should be deleted.
This fix will be included in a coming patch to release 9.5, but you're invited to try it out already in your local installation.
We verified the fix restores the original performance, although, based on the considerations made already, the overall memory footprint for HP physics may appear considerably bigger than previous releases as isotopes from G4NDL-4.0 are loaded.
Increase of memory in other conditions (non-HP cases) are considered normal as long as events of higher complexity are generated, but is expected to reach a plateau after many events.
Thanks to all to have helped to spot the problem.
Comment 6 Marco Pinto 2012-02-19 13:57:51 CET
Dear Gabriele Cosmo, 

thank you (and of course the Geant4 collaboration) for finding the problem. I didn't test my application with the proposed solution yet but I have a question about it.

Assuming that the line 182 is the only issue that leads to the increasing amount of memory used, can we trust in all the simulations that we already have with Geant4 9.5 using HP neutrons without this correction? Would this line change the simulation results? 

I am asking this because I needed to have some simulations with the new version of Geant4 using the HP neutrons and what I did was to simulate with fewer events to not have the application killed. I have several thousands of outputs in those conditions. 

Best regards, 
Marco Pinto
Comment 7 Gabriele Cosmo 2012-02-19 17:53:32 CET
It was verified the bug has no effect to physics results, just a CPU/memory penalty.
Comment 8 Thomas Papaevangelou 2012-02-23 16:48:37 CET
Dear Gabriel,

I just made the correction you suggested and the problem is mostly solved. However, I still observe a slight memory increase as the number of processed events grows (~25 times less than before).
Comment 9 Thomas Papaevangelou 2012-02-27 13:53:41 CET
Hello again. 
Adding some boron in my simulation resulted to a fast memory increase again. Is it possible that the same error occures in the inelastic hadronic process? (Boron has a very high inelastic cross section and its addition to the simulation results to much more inelastic interactions)

Regards, 
Thomas
Comment 10 Alberto.Ribon 2012-02-28 16:34:59 CET
Created attachment 150 [details]
C++ source file: G4WHadronElasticProcess.cc

The attached file corresponds to the Geant4 SVN tag: hadr-proc-V09-05-01
in the directory: geant4/source/processes/hadronic/processes .
The tag has been made by Vladimir Ivanchenko on 22 February 2012.
Comment 11 Thomas Papaevangelou 2012-02-29 10:47:33 CET
I've tried the file attached by Alberto Ribon, but the problem remains (no visible improvement - application crashed at ~the same ev. no). Trying to better localise the source of the leak, I used the hadronic process as it is defined in the physics list from example advanced/underground_physics (DMXPhysicsList.cc) which uses only the LE and HP neutron models and the leak dissapeared completely (application is runing since yesterday without change in memory usage), while the physics results are correct. I hope this helps a bit...
Comment 12 Vladimir.Ivantchenko 2012-02-29 19:24:08 CET
I just run hadr01 with the fix sent by Alberto. There is no memory grows for QGSP_BIC and QGSP_BIC_HP both for neutron incident and B11 incident. Do not forget to put the code in the sub-directory $G4INSTALL/source/processes/hadronic/processes/src and recompile both G4 and the example.

VI
Comment 13 Thomas Papaevangelou 2012-03-02 14:02:41 CET
Apparently I have replaced the file and recompiled geant4 and the application. The problem remains:

with QGSP_BIC_HP and QGSP_BERT_HP I get a memory inctrease of ~7 MB / 1000 events
with QGSP_BIC or DMXPhysicsList.cc (HP and LE neutron models only) no memory leak.

Please note that the application is simple but "heavy" since it's studying neutron moderation in polyethylene blocks and absorption in boron, i.e. many interaction per event ocure.
Comment 14 Dmitrij Fedorchenko 2012-03-02 15:55:52 CET
We have used the attached patch with various projects and found that memory leaks appear when modular physics is used. We have observed memory leaks using QGSP_BIC, QGSP_BIC_HP, QGSP_BERT, QGSP_BERT_HP and also with pure EM modular physics (Penelope, etc). At the same time usage of UserPhysicsLists (such as above mentioned DMXPhysicsList or various self-created  physics lists) entirely eliminates memory leakage. To our opinion problems arise due to changes in some PhysicsListsBuilder internals compared to previous version.

D. Fedorchenko
Comment 15 Vladimir.Ivantchenko 2012-03-02 16:03:19 CET
Is it possible to send a tar file which show memory growth after the patch? We do not see the effect in standard examples, so analysis of the problem is difficult. 

VI
Comment 16 Thomas Papaevangelou 2012-03-02 17:24:16 CET
Created attachment 151 [details]
Application where the memory leak appears

Run the application: i.e. "neutronShielding poly.mac 300 0 50"

Make sure that the files spectrum.dat & gspectrum.dat are copied on the directory where the aplication is executed. In the same directory a subdir called "output" should exist.
Comment 17 Dmitrij Fedorchenko 2012-03-03 09:56:08 CET
Created attachment 152 [details]
Application with memory leak

This is application where memory leak appears. Memory leak disappears switching to DMXPhysicsList (epos.cc, line 39, commented). We have used Visual Studio 2010 for compilation, the GEANT toolkit and application were both compiled in release configuration. The target hardware was Intel Core i7 2600 @ 3.40GHz with 16 Gb DDR3 memory running Windows 7 64bit.
To run the example specify: epos run.mac
Comment 18 Vladimir.Ivantchenko 2012-03-05 12:20:28 CET
Hello,

Thanks for sending both applications. My conclusion : we have no problem anymore.
What happens in both cases: virtual memory for QGSP_BIC is about 100M. When QGSP_BIC_HP is used during first (~100) events memory grows up to about 250M and stay stable after. This extra 150 M can be explained by needs of HP to build internal tables. The amount of extra memory was smaller in 9.4, because in 9.5 the new extended data have been included in HP.

VI
Comment 19 Vladimir.Ivantchenko 2012-03-05 12:22:30 CET
Hi Dennis,
You may close the problem.
VI
Comment 20 Thomas Papaevangelou 2012-04-10 17:33:32 CEST
Hello,

I have been away for long time and could not check the fixes in time. So I installed the newlly released patched version and checked it. What I saw is that:

- The virtuall memory for the QGSP_BIC model is something more than 100 MB
- When the QGSP_BIC_HP model is used, when the run starts the hp x-sections are loaded leading to a total memory use of 242 MB
- The memory KEREPS INCREASING by ~5 MB / 10000 events.

Using the G4WHadronElasticProcess.cc fill attached above doesn't change anything.

So, if the problem is solved what am I doing wrong?
Comment 21 Thomas Papaevangelou 2012-04-10 17:33:32 CEST
Hello,

I have been away for long time and could not check the fixes in time. So I installed the newlly released patched version and checked it. What I saw is that:

- The virtuall memory for the QGSP_BIC model is something more than 100 MB
- When the QGSP_BIC_HP model is used, when the run starts the hp x-sections are loaded leading to a total memory use of 242 MB
- The memory KEREPS INCREASING by ~5 MB / 10000 events.

Using the G4WHadronElasticProcess.cc fill attached above doesn't change anything.

So, if the problem is solved what am I doing wrong?
Comment 22 Vladimir.Ivantchenko 2012-04-10 20:50:01 CEST
Hello,

First of all be sure that you install 9.5patch01 and re-build Geant4 and your application from scratch.

With the patch we do not see memory increase anymore after initialization is completed. 

If in your application the increase still exist I would think that it is a new bug. Be sure that it is the case and open a new report. Because bug does not show up in Hadr01 example you need to add tar file with your application to the report.

VI
Comment 23 Dmitrij Fedorchenko 2012-04-11 09:04:06 CEST
I have installed 9.5 patch 01, recompiled from source, but memory leak is still present (I've tried the application I submitted here earlier). Maybe this is truly some other bug? Should I open a new report?
Comment 24 Vladimir.Ivantchenko 2012-04-11 12:52:26 CEST
Hello Dmitrij,

I make a fresh installation of 9.5p01 at SLC5 gcc 4.3. No leak for Hadr01. I have tried your epos.tar.gz - it start to work for me after minor correction in the main file : "Globals.hh" -> "globals.hh" and addition of GNUmakefile which was absent.  I do not see any memory grow when run QGSP_BIC_HP with primary e- or neutron. Valgrind reports memory leak at initialisation in a good part in G4ScoringManager.

This means: problem may be platform dependent. If a problem exist it is not clear if it has any connection to HP or not and to hadron physics in general. As far as I understand your simulation initially is electromagnetic with e- as a primary particle. In that case the main suspect part of Geant4 is scoring.

VI
Comment 25 Vladimir.Ivantchenko 2012-04-11 13:11:38 CEST
Hello Thomas,

Your tar file does not work at all. Problem is in main which has segmentation fault promptly before G4 starts. You need provide a new one with the README where it is written how batch job should be started in order to see a problem.

VI
Comment 26 Dmitrij Fedorchenko 2012-04-11 13:23:47 CEST
(In reply to comment #24)
> Hello Dmitrij,
> 
> I make a fresh installation of 9.5p01 at SLC5 gcc 4.3. No leak for Hadr01. I
> have tried your epos.tar.gz - it start to work for me after minor correction in
> the main file : "Globals.hh" -> "globals.hh" and addition of GNUmakefile which
> was absent.  I do not see any memory grow when run QGSP_BIC_HP with primary e-
> or neutron. Valgrind reports memory leak at initialisation in a good part in
> G4ScoringManager.
> 
> This means: problem may be platform dependent. If a problem exist it is not
> clear if it has any connection to HP or not and to hadron physics in general.
> As far as I understand your simulation initially is electromagnetic with e- as
> a primary particle. In that case the main suspect part of Geant4 is scoring.
> 
> VI

Thanks you for your comments, Vladimir. I really have overlooked scoring as a source for memory leakage. I'll review it throughly for possible deficiencies.

Dmitrij
Comment 27 Thomas Papaevangelou 2012-04-11 14:28:49 CEST
Created attachment 157 [details]
Updated application with README file
Comment 28 Thomas Papaevangelou 2012-04-11 14:28:58 CEST
Hello Vladimir,

The segmentation fault is because a subdirectory named "./output/" should exist at the location where you execute the application. I uploaded a new tar file with a README.txt included.

Thanks a lot,
Thomas
Comment 29 Thomas Papaevangelou 2012-04-11 17:39:37 CEST
Hello again,

I tried example hadr01. Modifiying the hadr01.in as following I got an increase of ~20MB after 100000 events.

#================================================
#     Macro file for hadr01
#     06.06.2006 V.Ivanchneko
#================================================
/control/verbose 2
/run/verbose 1
/tracking/verbose 0
#
/testhadr/TargetMat        G4_Al 
/testhadr/TargetRadius     50  cm
/testhadr/TargetLength     100 cm
/testhadr/NumberDivZ       10
/testhadr/PrintModulo      10
#
/testhadr/CutsAll          1 mm
/testhadr/Physics          QGSP_BIC_HP
#
/run/initialize
#
/gun/particle neutron
/gun/energy 1. MeV
/run/beamOn 100000
#
Comment 30 Vladimir.Ivantchenko 2012-04-11 18:35:17 CEST
Hello Thomas,

Your tar file does not work at all. Problem is in main which has segmentation fault promptly before G4 starts. You need provide a new one with the README where it is written how batch job should be started in order to see a problem.

VI
Comment 31 Thomas Papaevangelou 2012-04-11 18:48:10 CEST
As I mention in a previous message and in the README.txt file that is in the updated application attached this afternoon, a subdirectory named "./output/" must exist on the directory where you run the application. If this exist, as well as the 2 datafiles mentioned in the README the application should not crash.
Comment 32 Vladimir.Ivantchenko 2012-04-11 18:49:21 CEST
Created attachment 158 [details]
Valgrind report

Valgrind report for neutronShielding
Comment 33 Vladimir.Ivantchenko 2012-04-11 18:53:40 CEST
Hello Thomas,

I have send valgrind report - in your application are problems at initialisation and at the end. Also it is not correct to instantiate 2 Physics Lists in main.

AT SLC5 gcc 4.3 I do not see any increase of memory both with your application and Hadr01. 

Valgrind with leak check show many problems which in majority can be explained as not accurate object deletion at the end of a job but not as a run time memory leak.

VI
Comment 34 Thomas Papaevangelou 2012-04-11 19:16:49 CEST
The 2 physics lists is a mistake - probably remained from when I was trying different physics configurations. Still the facts are as mentioned before:

 - with QGSP_BIC_HP and QGSP_BERT_HP I get memory inctrease. 
 - with QGSP_BIC or DMXPhysicsList.cc (HP and LE neutron models only) no memory
leak.
 - memory leak with the hadr01 example.

I don't know how to use valgrid, I just watched the memory usage with top for some time. My system is Ubuntu 10.04 with g++ 4.4.3. Maybe this creates the problem?
Comment 35 Thomas Papaevangelou 2012-04-11 19:30:23 CEST
Found the bug!!! My mistake, I had a second version of g++, so I was maintaining two versions of the built libraries! Clarifying it, the leak has dissapeared from both applications with the patch p01. No more leak