High-Resolution Volcanic Ash Dispersal Forecast

 A Use Case by

Short description

Operational volcanic ash dispersal forecasts are routinely used to prevent aircraft encounters with volcanic ash clouds and to perform re-routings avoiding contaminated airspace areas. However, a gap exists between current operational forecast products (e.g. issued by the Volcanic Ash Advisory Centers) and the requirements of the aviation sector and related stakeholders. Two aspects are particularly critical: 1) time and space scales of current forecasts are coarse (for example, the current operational setup of the London VAAC at U.K. Met. Office outputs on a 40 km horizontal resolution grid and 6 hour time averages) and; 2) quantitative forecasts. Several studies (e.g. Kristiansen et al., 2012) have concluded that the main source of epistemic/aleatory uncertainty in ash dispersal forecasts comes from the quantification of the source term (eruption column height and strength) which, very often, is not fully-constrained on real time. This limitation can be circumvented in part by integrating into models ash cloud observations away from the source, typically from satellite retrievals of fine ash column mass load (i.e. vertical integration of concentration). Model data assimilation has the potential to improve ash dispersal forecasts by an efficient joint estimation of the (uncertain) volcanic source parameters and the state of the ash cloud.

Results & Achievements

Implementation of ensemble forecasts in FALL3D to run different ensemble members (realizations) as a single model run.

A new workflow component has been developed to retrieve ash (and SO2) cloud column mass from last-generation satellite instrumentation.

A new satellite data assimilation module based on the Parallel Data Assimilation Framework (PDAF) has been implemented.

Objectives

Volcanic ash cloud forecasts are performed shortly before or during an eruption in order to predict expected fallout rates in the next hours/days and/or to prevent aircraft encounters with volcanic clouds. These forecasts constitute the main decision tool for flight cancellations and airplane re-routings avoiding contaminated airspace areas. However, an important gap exists between current operational products and the actual requirements from the aviation industry and related stakeholders in terms of model resolution, frequency of forecasts, and quantification of airborne ash concentration. This pilot demonstrator is implementing an ensemble-based data assimilation system (workflow) combining the FALL3D dispersal model with high-resolution geostationary satellite retrievals in order to furnish high-resolution forecasts

Technologies

Workflow

Use case workflow includes the following components:

The download and pre-process of required meteorological data.

The download of raw satellite data and the cloud mass quantitative retrievals (SEVIRI retrievals at 0.1º resolution, 1-hour frequency).

The ensemble forecast execution using the FALL3D model (i.e. the HPC component of the workflow)

No WMS available yet (work in progress)

Software involved

 

FALL3D code

Use Case Owner

Arnau Folch
Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS)

Collaborating Institutions

BSC
INGV
IMO

Mesoscale simulation of billion atom complex systems using thousands of GPGPUS's

 A Use Case by

Short description

In collaboration with the UKRI STFC Daresbury Laboratory, E-CAM has developed a highly effi-cient version of DL_MESO, a software package for mesoscale simulations developed at the UKRI STFC [1]. This distributed GPU acceleration de-velopment is an extension of the DL_MESO pack-age to MPI+CUDA that exploits the computational power of the latest NVIDIA cards on hybrid CPU–GPU architectures. The need to port DL_MESO to massively parallel computing platforms arose be-cause often real systems are made of millions of particles and small clusters are usually not suffi-cient to obtain results in brief time. Moreover, with the advent of hybrid architectures, updating the code is becoming an important software engineer-ing step to allow scientist to continue their work on such systems.

Results & Achievements

The current multi-GPU version of DL_MESO scales with an 85% parallel efficiency up to 4096 GPUs (equivalent to almost 20 petaflops of raw double precisions performance) (see Fig. 1) [2]. This allows the simulation of very large systems like a phase mixture with 1.8 billion particles (Fig. 2).

For improved load balancing, E-CAM’s load balancing ALL[3] developed at the Juelich Supercomputing Centre has been implemented in the multi-GPU version of DL_MESO (DPD). The intention is to allow for better performance when modelling complex systems, like large proteins or lipid bi-layers, redistributing the workload across the GPUs. The library Kokkos [4] is also being incor-porated in DL_MESO (DPD), enabling the execu-tion of DL_MESO_DPD on NVidia GPUs as well as on other GPUs or architectures (many-core hardwarelike KNL), allowing performance porta-bility as well as separation of concern between computational science and HPC.

Objectives

The rewrite of the DL_MESO code allows the sim-ulation of billion of atoms complex systems on thousands of GPGPUs. This is necessary to simu-late surfactants, key ingredients in personal care products, dish soaps, laundry detergents, etc. At the microscopic level, surfactants are very long chains of molecules, known as polymers. Realistic poly-mers are typically very large macromolecules, and their modelling in industrial manufacturing processes (e.g. to predict complex material properties), is a very challenging task.

EXCELLERAT: Enabling parallel mesh adaptation with Treeadapt

 A Use Case by

Short description

CFD remains highly dependent on mesh quality. Advanced meshing software is generally limited to sequential or shared memory architectures. Therefore, meshing requires tens of hours to generate highly complex grids and it is highly dependent on user experience. Refinement zones are also bounded by standard geometrical shapes. To bypass these bottlenecks, codes have turned to mesh adaptation as a solution but massively parallel mesh adaptation workflows remain scarce and require efficient load balancing, interpolation and remeshing techniques.

Results & Achievements

So far, simulations using AVBP relied on static commercial software generated meshes. This bounded the quality of results to the experience and insight of the user performing the simulation. Furthermore, these mesh generation tools are mostly non-parallel and require days to mesh complex cases.

Recent work on efficient load balancing within the EPEEC project allowed CERFACS to use the open source library Treepart to create a new tool/library called Treeadapt that allows massively parallel mesh adaptation. Treeadapt uses its mesh partitioning and load balancing algorithms based on the ZOLTAN library to efficiently decompose the domains taking into account the architecture intricacies of the system it is running on. These load balancing algorithms coupled with MMG allow faster and more efficient remeshing.

Then, Treeadapt has been used to improve the simulation of a 42 injector rocket engine demonstrator simulation called BKD in the PRACE project: the prediction of combustion instabilities in liquid rocket engines (RockDyn). This simulation in collaboration with EM2C Centrale Supelec (T. Schmitt) reaching one Billion tetrahedras in less than 30 minutes, using 4,096 AMD Epyc 2 cores (compared to 70 hours with a standard meshing tool).

Objectives

Building on previous experience with the INRIA MMG library and a partnership with the EPEEC project during which an efficient mesh partitioning and load balancing library called Treepart was developed, EXCELLERAT is looking to develop a new application/library called Treeadapt for massively parallel mesh adaptation. Currently operating in full tetrahedral grids, Treeadapt generates a first partitioned domain where MMG can be called independently while freezing the parallel interfaces. A rebalancing and adaptation iterative process then occurs until the whole domain is within a user provided tolerance with regards to the error estimator (example: gradient of density, Hessian of the velocity).

Technologies

AVBP code
Treeadapt

Use Case Owner

Collaborating Institutions

Full airplane simulations on Heterogeneous Architectures

A solution based on Dynamic Load Balancing

 A Use Case by

Short description

Many of the future Exascale systems will be heterogeneous and include accelerators such as GPUs. With the explosion of parallelism, we also expect the performance of the various computing devices to be more variable and, therefore, the performance of the system components to be less certain. Leading-edge engineering simulation codes need to be malleable enough to adapt to the new environment. In the current use case Alya is used, which is one of the only two CFD codes of the Unified European Applications Benchmark Suite (UEBAS) as well as the Accelerator benchmark suite of PRACE. Alya, EXCELLERAT’s reference code is used for modelling complex systems, like airplane simulations, dynamic load balance mechanics are required to adjust the workload distribution to the measured performance of each component of the system.

Results & Achievements

The EXCELLERAT software, based on the above mentioned SFC method, can partition 250 Million elements mesh of an airplane within 0.08 seconds using 128 nodes (6,144 CPU-cores) of the MareNostrum V supercomputer. Consequently, mesh partitions can be recomputed at runtime for load balancing without producing a significant overhead. This approach was applied to perform full airplane simulations on the heterogeneous POWER9 cluster installed at the Barcelona Supercomputing Center. In the BSC POWER9 cluster we demonstrated that we could perform a well-balanced co-execution using both the CPUs and GPUs simultaneously. As a result, we obtained a 23% time reduction with respect to the GPU-only execution. In practice, this represents a performance boost equivalent to attaching an additional GPU per node and thus a much more efficient exploitation of the resources.

Objectives

In EXCELLERAT we use dynamic load balancing (DLB) to increase the parallel efficiency for airplane simulations, minimising idle time of underloaded devices at synchronisation points. Alya has been provisioned with a distributed memory DLB mechanism, complementary to the node-level parallel performance strategy already in place. The kernel parts of the method are an efficient in-house Space Filling Curve (SFC)-based mesh practitioner and an online redistribution module to migrate the simulation between two different partitions. Those are used to correct the partition according to runtime measurements. We have focused on maximising the parallel performance of the mesh partition process to minimise the load balancing overhead.

Technologies

Alya CFD code

Use Case Owner

Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS)

Collaborating Institutions

Barcelona Supercomputing Center (BSC)