?
Tracing of GPU-Aware MPI Applications: First Benchmarks for the Angara Interconnect
The efficiency of data transfer is one of the most important issues of supercomputer development in the post-Moore era. The rise of heterogeneous computing systems introduces such complicated patterns of data transfers as, for instance, the GPU-aware MPI technology. The practical deployment of this technology in applications requires the development of the dedicated system software as well as the analysis tools for tracing the runtime behavior of the corresponding applied algorithms. In this work we present the UCX API for GPU-aware MPI implementation over Angara interconnect and analyze the execution patterns of the rocHPL benchmark using the Score-P infrastructure. This analysis allows us to make a comparison of the GPU-aware MPI implementation for the Angara interconnect with the InfiniBand implementation.