EXCELLERAT: Enabling parallel mesh adaptation with Treeadapt

 A Use Case by

Short description

CFD remains highly dependent on mesh quality. Advanced meshing software is generally limited to sequential or shared memory architectures. Therefore, meshing requires tens of hours to generate highly complex grids and it is highly dependent on user experience. Refinement zones are also bounded by standard geometrical shapes. To bypass these bottlenecks, codes have turned to mesh adaptation as a solution but massively parallel mesh adaptation workflows remain scarce and require efficient load balancing, interpolation and remeshing techniques.

Results & Achievements

So far, simulations using AVBP relied on static commercial software generated meshes. This bounded the quality of results to the experience and insight of the user performing the simulation. Furthermore, these mesh generation tools are mostly non-parallel and require days to mesh complex cases.

Recent work on efficient load balancing within the EPEEC project allowed CERFACS to use the open source library Treepart to create a new tool/library called Treeadapt that allows massively parallel mesh adaptation. Treeadapt uses its mesh partitioning and load balancing algorithms based on the ZOLTAN library to efficiently decompose the domains taking into account the architecture intricacies of the system it is running on. These load balancing algorithms coupled with MMG allow faster and more efficient remeshing.

Then, Treeadapt has been used to improve the simulation of a 42 injector rocket engine demonstrator simulation called BKD in the PRACE project: the prediction of combustion instabilities in liquid rocket engines (RockDyn). This simulation in collaboration with EM2C Centrale Supelec (T. Schmitt) reaching one Billion tetrahedras in less than 30 minutes, using 4,096 AMD Epyc 2 cores (compared to 70 hours with a standard meshing tool).

Objectives

Building on previous experience with the INRIA MMG library and a partnership with the EPEEC project during which an efficient mesh partitioning and load balancing library called Treepart was developed, EXCELLERAT is looking to develop a new application/library called Treeadapt for massively parallel mesh adaptation. Currently operating in full tetrahedral grids, Treeadapt generates a first partitioned domain where MMG can be called independently while freezing the parallel interfaces. A rebalancing and adaptation iterative process then occurs until the whole domain is within a user provided tolerance with regards to the error estimator (example: gradient of density, Hessian of the velocity).

Technologies

AVBP code
Treeadapt

Use Case Owner

Collaborating Institutions

Full airplane simulations on Heterogeneous Architectures

A solution based on Dynamic Load Balancing

 A Use Case by

Short description

Many of the future Exascale systems will be heterogeneous and include accelerators such as GPUs. With the explosion of parallelism, we also expect the performance of the various computing devices to be more variable and, therefore, the performance of the system components to be less certain. Leading-edge engineering simulation codes need to be malleable enough to adapt to the new environment. In the current use case Alya is used, which is one of the only two CFD codes of the Unified European Applications Benchmark Suite (UEBAS) as well as the Accelerator benchmark suite of PRACE. Alya, EXCELLERAT’s reference code is used for modelling complex systems, like airplane simulations, dynamic load balance mechanics are required to adjust the workload distribution to the measured performance of each component of the system.

Results & Achievements

The EXCELLERAT software, based on the above mentioned SFC method, can partition 250 Million elements mesh of an airplane within 0.08 seconds using 128 nodes (6,144 CPU-cores) of the MareNostrum V supercomputer. Consequently, mesh partitions can be recomputed at runtime for load balancing without producing a significant overhead. This approach was applied to perform full airplane simulations on the heterogeneous POWER9 cluster installed at the Barcelona Supercomputing Center. In the BSC POWER9 cluster we demonstrated that we could perform a well-balanced co-execution using both the CPUs and GPUs simultaneously. As a result, we obtained a 23% time reduction with respect to the GPU-only execution. In practice, this represents a performance boost equivalent to attaching an additional GPU per node and thus a much more efficient exploitation of the resources.

Objectives

In EXCELLERAT we use dynamic load balancing (DLB) to increase the parallel efficiency for airplane simulations, minimising idle time of underloaded devices at synchronisation points. Alya has been provisioned with a distributed memory DLB mechanism, complementary to the node-level parallel performance strategy already in place. The kernel parts of the method are an efficient in-house Space Filling Curve (SFC)-based mesh practitioner and an online redistribution module to migrate the simulation between two different partitions. Those are used to correct the partition according to runtime measurements. We have focused on maximising the parallel performance of the mesh partition process to minimise the load balancing overhead.

Technologies

Alya CFD code

Use Case Owner

Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS)

Collaborating Institutions

Barcelona Supercomputing Center (BSC)