Problem 2564 - EXC_BAD_ACCESS (code=EXC_I386_GPFLT) errors from PTL::ThreadPool on Intel Macs with Clang 15
Summary: EXC_BAD_ACCESS (code=EXC_I386_GPFLT) errors from PTL::ThreadPool on Intel Mac...
Status: CLOSED FIXED
Alias: None
Product: Geant4
Classification: Unclassified
Component: cmake (show other problems)
Version: 11.1
Hardware: Apple Mac OS X
: P4 normal
Assignee: Ben Morgan
URL:
Depends on:
Blocks:
 
Reported: 2023-10-11 15:51 CEST by Ben Morgan
Modified: 2024-04-17 15:43 CEST (History)
0 users

See Also:


Attachments
Patch to workaround AppleClang 15/Intel ThreadPool GPFLT error (1.62 KB, patch)
2023-10-12 16:25 CEST, Ben Morgan
Details | Diff

Note You need to log in before you can comment on or make changes to this problem.
Description Ben Morgan 2023-10-11 15:51:31 CEST
Compiling Geant4 11.1 with Clang 15/Mac OS Ventura/Sonoma on machines with an Intel CPU works fine, and the build of applications is also successful. Applications using the Task-based run manager fail at runtime with errors of the form (exampleB1 taken here):

```
Process 51137 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010106071a libG4ptl.2.dylib`PTL::ThreadPool::execute_thread(PTL::VUserTaskQueue*) [inlined] std::__1::shared_ptr<std::__1::mutex>::shared_ptr[abi:v160006](this=<unavailable>, __r=std::__1::shared_ptr<std::__1::mutex>::element_type @ 0x0000600003e0d458 strong=1 weak=1) at shared_ptr.h:634:11 [opt]
   631 	
   632 	    _LIBCPP_HIDE_FROM_ABI
   633 	    shared_ptr(const shared_ptr& __r) _NOEXCEPT
-> 634 	        : __ptr_(__r.__ptr_),
   635 	          __cntrl_(__r.__cntrl_)
   636 	    {
   637 	        if (__cntrl_)
```

Running in lldb gives a backtrace:

```
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x000000010106071a libG4ptl.2.dylib`PTL::ThreadPool::execute_thread(PTL::VUserTaskQueue*) [inlined] std::__1::shared_ptr<std::__1::mutex>::shared_ptr[abi:v160006](this=<unavailable>, __r=std::__1::shared_ptr<std::__1::mutex>::element_type @ 0x0000600003e0d458 strong=1 weak=1) at shared_ptr.h:634:11 [opt]
    frame #1: 0x00000001010606fa libG4ptl.2.dylib`PTL::ThreadPool::execute_thread(PTL::VUserTaskQueue*) [inlined] std::__1::shared_ptr<std::__1::mutex>::shared_ptr[abi:v160006](this=<unavailable>, __r=std::__1::shared_ptr<std::__1::mutex>::element_type @ 0x0000600003e0d458 strong=1 weak=1) at shared_ptr.h:636:5 [opt]
    frame #2: 0x00000001010606fa libG4ptl.2.dylib`PTL::ThreadPool::execute_thread(this=0x00007fb171994c60, _task_queue=0x00006000031055e0) at ThreadPool.cc:819:34 [opt]
    frame #3: 0x000000010105feb3 libG4ptl.2.dylib`PTL::ThreadPool::start_thread(tp=0x00007fb171994c60, _data=0x00007fb171994d90 size=4, _idx=1) at ThreadPool.cc:137:9 [opt]
    frame #4: 0x00000001010675b8 libG4ptl.2.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long), PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, unsigned long>>(void*) [inlined] decltype(std::declval<void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long)>()(std::declval<PTL::ThreadPool*>(), std::declval<std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*>(), std::declval<unsigned long>())) std::__1::__invoke[abi:v160006]<void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long), PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, unsigned long>(__f=0x00006000014c1f88, __args=0x00006000014c1f90, __args=size=1, __args=0x00006000014c1fa0) at invoke.h:394:23 [opt]
    frame #5: 0x00000001010675a9 libG4ptl.2.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long), PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, unsigned long>>(void*) [inlined] void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long), PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, unsigned long, 2ul, 3ul, 4ul>(__t=size=5, (null)=<unavailable>) at thread:288:5 [opt]
    frame #6: 0x00000001010675a9 libG4ptl.2.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, long), PTL::ThreadPool*, std::__1::vector<std::__1::shared_ptr<PTL::ThreadData>, std::__1::allocator<std::__1::shared_ptr<PTL::ThreadData>>>*, unsigned long>>(__vp=0x00006000014c1f80) at thread:299:5 [opt]
    frame #7: 0x00007ff815e66202 libsystem_pthread.dylib`_pthread_start + 99
    frame #8: 0x00007ff815e61bab libsystem_pthread.dylib`thread_start + 15
```

This only occurs with tasking, running the same application in Sequential or POSIX Multithreading modes does not expose this issue.

Further triage shows that the problem only occurs when Geant4 is built with O2 or O3 optimization. The error is occurring at a seemingly innocuous line that copies a shared_ptr to a thread-local variable, so more work is needed to identify if this is a compiler/system issue of a problem in the PTL code. Apple M1/M2 systems run fine, as do x86_64 Linux systems with Clang 15/16.

The current workarounds are:

1. Run the application in threading or sequential modes by setting the `G4FORCE_RUN_MANAGER_TYPE` environment variable to `MT` or `Sequential`, e.g.

$ G4FORCE_RUN_MANAGER_TYPE=MT ./exampleB1 ...

2. Compile Geant4 in `Debug` mode or with `-O1` optimization

A workaround patch to disable optimization of the affected PTL function is being prepared and will be posted here when tested.
Comment 1 Gabriele Cosmo 2023-10-11 16:22:31 CEST
This looks likely a compiler bug...
Comment 2 Ben Morgan 2023-10-12 16:25:00 CEST
Created attachment 828 [details]
Patch to workaround AppleClang 15/Intel ThreadPool GPFLT error

This is the current workaround for the runtime error, though only tested through Nightly builds. Though it is low impact, it should be considered a workaround until the problem is fully understood as a compiler/system bug, or a real issue in Geant4.
Comment 3 Ben Morgan 2023-10-12 16:25:39 CEST
Will leave as ASSIGNED until fully understood.
Comment 4 Ben Morgan 2024-04-17 15:42:06 CEST
Seemingly a compiler bug, workaround applied in upstream and in patch releases.