Problem 1879

Summary: Segfault at magnetic field edges
Product: Examples/Advanced Reporter: Sonja Schellhammer <s.schellhammer>
Component: hadrontherapyAssignee: cirrone
Status: RESOLVED FIXED    
Severity: normal CC: a.matthes, jan.pipek
Priority: P5    
Version: 10.2   
Hardware: All   
OS: All   
Attachments: Crash reproduction code

Description Sonja Schellhammer 2016-08-11 11:03:45 CEST
Hello,

I had a problem with the magnetic field classes implemented in the hadrontherapy and the purging magnet examples of Geant4. 

In some runs it segfaulted, and with gdb we figured out that it crashed in line 191 of HadrontherapyMagneticField3D.cc (line 219 of PurgMagTabulatedField3D.cc):
> xField[xindex+1][yindex  ][zindex  ] *    xlocal  * (1-ylocal) * (1-zlocal) +

Furthermore we saw, that xindex+1 was as big as the whole x-width of the xField array (xindex+1 == nx), which is obviously wrong.

When we fixed the range check in line 145+ (171+, respectively) from
>  if ( x>=minx && x<maxx &&
>       y>=miny && y<maxy &&
>       z>=minz && z<maxz ) {
to
>  if ( x>=minx && x+dx/nx<maxx &&
>       y>=miny && y+dy/ny<maxy &&
>       z>=minz && z+dz/nz<maxz ) {

so that it also checks whether the interpolation neighbour is valid too, it seems to work.

Now my question: Did we do something wrong (e.g. while loading the data) or did we find a bug in the example?
Comment 1 John Apostolakis 2016-09-15 07:54:33 CEST
Thank you for your feedback.

Your proposed change would 

Instead the code should be changed to use the 'floor' method when converting from the double variables xdindex, ydindex and zdindex to their integer variable xindex, yindex and zindex.  Using the method
  double floor (double x);
ensure that the value is the smaller of the integers near the value of the double - so for example floor(3.7) = 3, and not the rounded value 4.

So the relevant code in the magnetic field class should be changed to:

    // The indices of the nearest tabulated point whose coordinates
    // are all less than those of the given point
    int xindex = static_cast<int>(floor(xdindex));
    int yindex = static_cast<int>(floor(ydindex));
    int zindex = static_cast<int>(floor(zdindex));

Please let us know whether you have further issues once you have tried this change. Best regards, John
Comment 2 Jan Pipek 2016-11-14 13:49:15 CET
Hello Sonja and John,

we fixed this in the hadrontherapy-V10-02-03 tag. So in the 10.3 release this December, the example should work for you :-)

The problem arises if (and only if, I believe) invertX/Y/Z is set to true and a lower boundary for the respective axis is accessed. Then (z >= minz && z < maxz) is satisfied and zfraction becomes 1.0. Consequently, the field array is asked for (nz-1)th and (nz)th elements => segmentation fault. No flooring can actually help with that (in fact, static_cast<int> truncates the number towards 0 which for positive values is exactly the same behaviour as floor-ing the number first). It is necessary to prevent zfraction == 1.0 from happening - this we achieved by changing the inequality from => to > .

Best regards,

Jan Pipek
INFN-LNS
Comment 3 Alexander Matthes 2016-11-14 16:27:19 CET
Created attachment 426 [details]
Crash reproduction code
Comment 4 Jan Pipek 2016-11-15 13:17:52 CET
Alexander Matthes (colleague of Sonja Schellhammer) yesterday reported that my bug fix actually does not fix the bug (see the attachment). He is right (the bug manifests much less frequently for non-inverted fields but it does). And the bug fix still fails in some (though very specific) cases. So the only really safe way how to fix it is to put the if-condition AFTER calculating xindex, yindex, zindex.

Thanks Sonja & Alexander! And I am sorry for failing in interpreting my debugging code (which pointed the problem only to inverted fields) :-(

How do we proceed now? Can we squeeze the update in the 10.3 release or do we have to wait for patch1? The issue is not so serious and manual patch can solve it for any potential user.

Sorry again,
Jan