Software – Page 8

Scalasca

Scalasca is a software tool that supports the performance optimization of parallel programs by measuring and analyzing their runtime behavior. The analysis identifies potential performance bottlenecks – in particular those concerning communication and synchronization – and offers guidance in exploring their causes.

CoE: POP

Paraver

< Go Back

Paraver is a very flexible data browser that is part of the CEPBA-Tools toolkit. Its analysis power is based on two main pillars. First, its trace format has no semantics; extending the tool to support new performance data or new programming models requires no changes to the visualizer, just to capture such data in a Paraver trace. The second pillar is that the metrics are not hardwired on the tool but programmed. To compute them, the tool offers a large set of time functions, a filter module, and a mechanism to combine two time lines. This approach allows displaying a huge number of metrics with the available data. To capture the experts knowledge, any view or set of views can be saved as a Paraver configuration file. After that, re-computing the view with new data is as simple as loading the saved file. The tool has been demonstrated to be very useful for performance analysis studies, giving much more details about the applications behaviour than most performance tools. Some Paraver features are the support for:

Detailed quantitative analysis of program performance
Concurrent comparative analysis of several traces
Customizable semantics of the visualized information
Cooperative work, sharing views of the tracefile
Building of derived metrics

CoE: POP

Extrae

< Go Back

Extrae profiling tool, developed by BSC, can very quickly produce very large trace files, which can take several minutes to load into Paraver, the tool used to view the traces. These trace files can be kept to a more manageable size by using Extrae’s API to turn the tracing on and off as needed. For example, the user might only want to record data for two or three time steps. This API was previously only available for codes developed in C, C++ and Fortran but now it also supports Python codes using MPI.

CoE: POP

SpFFT

< Go Back

SpFFT is a 3D FFT library for sparse frequency domain data written in C++ with support for MPI, OpenMP, CUDA, and ROCm. It was originally intended for transforms of data with spherical cutoff in frequency domain, as required by some computational materials science codes. For distributed computations, SpFFT uses a slab decomposition in space domain and pencil decomposition in frequency domain (all sparse data within a pencil must be on one rank). The library is open-source (BSD 3-clause licence) and is freely available.

http://www.max-centre.eu/software/libraries#6

CoE: MaX

COSMA

< Go Back

COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix multiplication algorithm and library implementation that is communication-optimal for all combinations of matrix dimensions, number of processors and memory sizes, without the need for any parameter tuning. COSMA is written in C++11 with MPI, OpenMP and CUDA/ROCm programming models. The library is open-source (BSD 3-clause licence) and is freely available.

http://www.max-centre.eu/software/libraries#5

CoE: MaX

DBCSR

< Go Back

DBCSR is a library designed to efficiently perform sparse matrix-matrix multiplication, among other operations. It is MPI and OpenMP parallel and can exploit Nvidia and AMD GPUs via CUDA and HIP. The library is open-source (GPL v2.0) and is freely available.

http://www.max-centre.eu/software/libraries#4

CoE: MaX

Sirius

< Go Back

SIRIUS is a domain specific library for electronic structure calculations. It implements pseudopotential plane wave (PP-PW) and full potential linearized augmented plane wave (FP-LAPW) methods and is designed for GPU acceleration of popular community codes such as Exciting, Elk and Quantum ESPRESSO. SIRIUS is used in production at CSCS to enable QE on GPUs. The library is open-source (BSD 2-clause licence) and is freely available. SIRIUS is written in C++11 with MPI, OpenMP and CUDA/ROCm programming models. SIRIUS is organised as a collection of classes that abstract away the different building blocks of DFT self-consistency cycle.

http://www.max-centre.eu/software/libraries#3

CoE: MaX

LAXLib & FFTXLib

< Go Back

LAXLib & FFTXLib – One of the most important obstacles when keeping the codes up to date with hardware is the programming style based on old-style (i.e., non object-oriented) languages. Programming styles in community codes are often naive and lack the modularity and flexibility. From here, the need to disentangle such codes is essential for implementing new features or simply refactoring the application in order to efficiently run on the new architectures. Rewriting from scratch one of these codes is not an option because the communities behind these codes would be disrupted. One of the possible approaches that could permit to evolve the code is to progressively encapsulate the functions and subroutines, breaking up the main application in small (possibly weakly dependent) parts. This strategy was followed by Quantum ESPRESSO: two main types of kernels were isolated in the independent directories and proposed as candidates for the domain-specific libraries for third-party applications. The first library, called LAXlib, contains all the low-level linear algebra routines of Quantum ESPRESSO, and in particular those used by the Davidson solver (e.g., the Cannon algorithm for the matrix-matrix product). The LAXlib also contains a mini-app that permits to evaluate the features of a HPC interconnect measuring the linear algebra routines contained therein. The second library encapsulates all the FFT related functions, including the drivers for several different architectures. The FFTXlib library is self-contained and can be built without any dependencies on the remaining part of the Quantum ESPRESSO suite. Similarly, in the FFTXlib there is a mini-app that permits to mimic the FFT cycle for the SCF calculation of the charge density tuning the parallelization parameters of Quantum ESPRESSO. This mini-app has also been used to test the new implementation using the MPI-3 non blocking collectives.

http://www.max-centre.eu/software/libraries#2

CoE: MaX

CheSS

< Go Back

CheSS – One of the most important tasks in electronic structure codes is the calculation of the density matrix. If not handled properly, this task can easily lead to a bottleneck that limits the performance of the code or even renders big calculations prohibitively expensive. CheSS is a library that was designed with the goal of enabling electronic structure calculations for very big systems. It is capable of working with sparse matrices, which naturally arise if big systems are treated with a localized basis. Therefore, it is possible to calculate the density matrix with O(N), i.e., the computation cost only increases linearly with the system size. The CheSS solver uses an expansion based on Chebyshev polynomials to calculate matrix functions (such as the density matrix or the inverse of the overlap matrix), thereby exploiting the sparsity of the input and output matrices. It works best for systems with a finite HOMO-LUMO gap and a small spectral width. CheSS exhibits a two-level parallelization using MPI and OpenMP and can scale to many thousands of cores. It has been converted into a stand-alone library starting from the original codebase within BigDFT. At the moment, it is coupled to the two MAX flagship codes BigDFT and SIESTA. The performance of CheSS has been benchmarked against PEXSI and (Sca)LAPACK for the calculation of the density matrix and the inverse of the overlap matrix, respectively. CheSS is the most efficient method, as it is demonstrated with more details and performance figures in the publication “Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library”.

http://www.max-centre.eu/software/libraries#1

CoE: MaX

AiiDA

< Go Back

AiiDA (Automated Interactive Infrastructure and Database for computational science) is a Python materials’ informatics framework to manage, store, share, and disseminate the workload of high-throughput computational efforts, while providing an ecosystem for materials simulations where complex scientific workflows involving different codes and datasets can be seamlessly implemented, automated and shared.

http://www.max-centre.eu/codes-max/aiida

CoE: MaX

Category: Software

Scalasca

Paraver

Extrae

SpFFT

COSMA

DBCSR

Sirius

LAXLib & FFTXLib

CheSS

AiiDA

Pages

HPC Centres of Excellence

Service

Get in touch