Problem 1735 - Proposed Modification of G4Allocators aiming to reduce average memory for MT jobs
Summary: Proposed Modification of G4Allocators aiming to reduce average memory for MT...
Status: RESOLVED REMIND
Alias: None
Product: Geant4
Classification: Unclassified
Component: global/management (show other problems)
Version: 10.1
Hardware: All All
: P5 enhancement
Assignee: Gabriele Cosmo
URL:
Depends on:
Blocks:
 
Reported: 2015-04-20 10:01 CEST by vitali choutko
Modified: 2016-01-28 09:07 CET (History)
0 users

See Also:


Attachments
source code & docs (200.00 KB, application/x-tar)
2015-04-20 10:01 CEST, vitali choutko
Details
source + doc + perf chart (220.00 KB, application/x-tar)
2015-04-22 14:29 CEST, vitali choutko
Details
Patch on top of 10.1.p01 to FTF and Bertini models for memory growth (80.52 KB, application/octet-stream)
2015-05-13 10:04 CEST, Gabriele Cosmo
Details

Note You need to log in before you can comment on or make changes to this problem.
Description vitali choutko 2015-04-20 10:01:55 CEST
Created attachment 333 [details]
source code & docs

The Modification of G4Allocators allowing the

fast garbage collection ( no reallocation of existing dat & preserving prellocated heap  below some threshold)

in 3 different modes:


1.   fully  automatic

2.   by demand  (e.g. in end of event)

3.   by demand mode allowing to clean up some G4Allocators by name assuming being empty eg at the end of event  /intrinsically unsafe/ 
 

To setup garbage collection the following function should be called at the beginning of user job 

G4Allocator::SetGarbageCollection(long long threshold_in_bytes,bool autogarbage);///< default values threshold_in_bytes=4000000 autogarbage=false

for mode 2. additional call is needed (eg in user end of event or directly in ./source/run/src/G4WorkerRunManager.cc
     G4AllocatorList *fa=G4AllocatorList::GetAllocatorListIfExist();
     if(fa)fa->CollectGarbage();
 
mode 3. requires G4Allocator::SetGarbageCollection(0) call at the beginning of user job

next at   end of event the " grbage collection"  can be invoked as shown below.  Note that this method can not be applied to original G4Allocators as they don;t contain an object name and  some objects (G4NavigationLevelRep, ..TouchableHistory..) persists between different events.   

// Unsafe Nethod if  no garbage collection done
 G4AllocatorList *fa=G4AllocatorList::GetAllocatorListIfExist();
double G4AllocatorSize=some_threshold;
if(!G4AllocatorPool::GarbageCollectionOn()){
for(int k=0;k<fa->fList.size();k++){
  if(fa->fList[k]->GetAllocatedSize()>G4AllocatorSize &&strstr(fa->fList[k]->tn.c_str(),"G4Track")){
     fa->fList[k]->ResetStorage();
  }
  if(fa->fList[k]->GetAllocatedSize()>G4AllocatorSize &&strstr(fa->fList[k]->tn.c_str(),"G4DynamicParticle")){
     fa->fList[k]->ResetStorage();
  }
}
}
 
Modes 1. and 2. are fully implemented in the attached tar file.


Performance example base on electrons momentum (P) 1/P distribution [2,10000] GeV/c :

Dynamic allocation drawing g4a.eps

Mode         CPU overhead per event (%)       Average G4AllocatorSize per thread for last half of events, MBytes

Original             0                               500

Mode 3               0                               320        

Mode 2               4.5                             287

Mode 1               7.5                             285
Comment 1 Gabriele Cosmo 2015-04-21 10:00:13 CEST
Thanks for submitting the proposed enhancement!
At first look, the performance figures reported somehow matches the expectations, i.e. at least 5% performance penalty in the best case, which is something that hardly can be accepted by experiments in production.
We'll anyhow take a look if this can be added as optional, but at the same time resolve current issues in the use of allocators by hadronic models, which should in any case reduce considerably the growth of the pools, and consequently the related allocator size.
Comment 2 vitali choutko 2015-04-22 14:29:09 CEST
Created attachment 337 [details]
source + doc + perf chart
Comment 3 vitali choutko 2015-04-22 14:30:27 CEST
Hi

Pelase find attached the simple modification which has less cpu overhead (2% instead of 4.5%) in default mode
Comment 4 Gabriele Cosmo 2015-05-13 10:04:29 CEST
Created attachment 345 [details]
Patch on top of 10.1.p01 to FTF and Bertini models for memory growth

I'm attaching a tar-ball including a couple of fixes to be applied on top of release 10.1.p01 in two hadronic models (FTF and Bertini Cascade) which should help in considerably reducing the memory growth in normal circumstances for physics configurations involving such models.
Can you please try them out and let me know if you see any visible benefits ?  Thanks!
Comment 5 vitali choutko 2015-05-18 11:17:55 CEST
Hi 

I confirm some net memory gain using latest patch running  C12 ions in AMS 

using FTF model, namely 

I see the decrease of the memory usage of about 25 MB/thread 


Best Regards, Vitali