On modern heterogeneous HPC systems, the most popular way to realize distributed computation is the hybrid programming model of MPI+X (X being OpenMP/CUDA/etc.), as it has been proven to perform well with various scientific applications. However, application developers prefer to use a single coherent programming model over a hybrid model, as maintainability and portability decrease per additional model. Recent work [14] has shown that the OpenMP device offloading model could be used to program distributed accelerator-based HPC systems with minimal changes to the application.
LLVM/OpenMP
MPI-Based Remote OpenMP Offloading: A More Efficient and Easy-to-Use Implementation
Shan, Baodi, Araya-Polo, Mauricio, Malik, Abid M., and Chapman, Barbara
In Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores 2023