Performance Portability on HPC Accelerator Architectures with Modern Techniques:
The ParFlow Blueprint
A Use Case by
Short description
Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Software development techniques that simultaneously result in good performance and developer productivity while keeping the codebase adaptable and well maintainable in the long-term are of high importance. ParFlow, a widely used hydrologic model based on C, achieves these attributes by using Unified Memory with a pool allocator and abstracting the architecture-dependent code in preprocessor macros (ParFlow eDSL). The implementation results in very good weak scaling with up to 26x speedup from the NVIDIA A100 GPUs over hundreds of nodes each containing 4 GPUs on the new JUWELS Booster system at the Jülich Supercomputing Centre.
Results & Achievements
The GPU support have been successfully implemented for ParFlow through using either CUDA or Kokkos (these are two separate lightweight implementations that can be used separately from each other). Especially the implementation of Kokkos constitutes a big leap in performance portability. Up to 26x speedup is achieved when using GPUs, versus using only CPUs.
Objectives
Objectives are to achieve performance portability developing and applying the ParFlow eDSL including support for coupled simulations where ParFlow is used in combination with other independently developed terrestrial models such as land-surface and atmospheric models. Multi-vendor support (i.e. performance portability) is preferred to guarantee compatibility with different supercomputer architectures.
Technologies
ParFlow, CUDA, Kokkos
Use Case Owner
FZ-Juelich IBG3
Collaborating Institutions
LLNL, FZ-Juelich