COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix multiplication algorithm and library implementation that is communication-optimal for all combinations of matrix dimensions, number of processors and memory sizes, without the need for any parameter tuning. COSMA is written in C++11 with MPI, OpenMP and CUDA/ROCm programming models. The library is open-source (BSD 3-clause licence) and is freely available.
CoE: MaX
The website is operated as part of the CASTIEL 2 project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101102047. The JU receives support from the European Union‘s Digital Europe Programme and Germany, Italy, Spain, France, Belgium, Austria, Estonia.