TY - GEN
T1 - From OpenACC to OpenMP5 GPU Offloading
T2 - 4th International Workshop on Extreme Heterogeneity Solutions, ExHET 2025
AU - Fridman, Yehonatan
AU - Goren, Yosef
AU - Oren, Gal
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/5/5
Y1 - 2025/5/5
N2 - The NAS Parallel Benchmarks (NPB) are widely used to evaluate parallel programming models, yet lack a native OpenMP offloading implementation for GPUs. This gap is significant given OpenMP’s emergence as a versatile standard for heterogeneous systems, offering broad compatibility with both current and future GPU architectures. Existing solutions, such as those that directly translate OpenACC to a binary executable, are limited by OpenACC’s stagnation and vendor-specific constraints, while not exposing OpenMP, which is used internally as an intermediate representation. This work addresses this limitation by developing a source-level translation of OpenACC-based NPB benchmarks into OpenMP5 offloading code. This translation employs a combination of automated source-to-source tool and manual optimization to ensure efficient execution across various GPU architectures. Performance evaluations indicate that the translated OpenMP versions deliver results comparable to the original OpenACC implementations, validating their reliability for GPU-based computations. Additionally, comparisons between GPU-accelerated OpenMP implementations and traditional CPU-based benchmarks reveal significant performance gains, especially in computationally intensive workloads. These findings highlight OpenMP’s potential as a unified programming model, offering superior portability and optimization capabilities across diverse hardware platforms. The sources of this work are available at our repository.
AB - The NAS Parallel Benchmarks (NPB) are widely used to evaluate parallel programming models, yet lack a native OpenMP offloading implementation for GPUs. This gap is significant given OpenMP’s emergence as a versatile standard for heterogeneous systems, offering broad compatibility with both current and future GPU architectures. Existing solutions, such as those that directly translate OpenACC to a binary executable, are limited by OpenACC’s stagnation and vendor-specific constraints, while not exposing OpenMP, which is used internally as an intermediate representation. This work addresses this limitation by developing a source-level translation of OpenACC-based NPB benchmarks into OpenMP5 offloading code. This translation employs a combination of automated source-to-source tool and manual optimization to ensure efficient execution across various GPU architectures. Performance evaluations indicate that the translated OpenMP versions deliver results comparable to the original OpenACC implementations, validating their reliability for GPU-based computations. Additionally, comparisons between GPU-accelerated OpenMP implementations and traditional CPU-based benchmarks reveal significant performance gains, especially in computationally intensive workloads. These findings highlight OpenMP’s potential as a unified programming model, offering superior portability and optimization capabilities across diverse hardware platforms. The sources of this work are available at our repository.
KW - GPU benchmarking
KW - heterogeneous systems
KW - NAS Parallel Benchmarks
KW - OpenACC
KW - OpenMP offloading
KW - performance optimization
UR - https://www.scopus.com/pages/publications/105007287141
U2 - 10.1145/3720555.3721989
DO - 10.1145/3720555.3721989
M3 - Conference contribution
AN - SCOPUS:105007287141
T3 - Proceedings of 2025 4th International Workshop on Extreme Heterogeneity Solutions, ExHET 2025
SP - 10
EP - 18
BT - Proceedings of 2025 4th International Workshop on Extreme Heterogeneity Solutions, ExHET 2025
PB - Association for Computing Machinery, Inc
Y2 - 2 March 2025
ER -