Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink

Nedomolkin I., Konikov M., Fedorov I. et al., , in: Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)Vol. 2891.: Springer, 2026. P. 513–531.

Modern supercomputer systems play a crucial role in scientific and engineering research. To ensure their effectiveness, these fields require reliable methods for evaluating supercomputer performance. Although benchmarking is a fundamental tool, current ranking systems often inadequately represent real-world performance in high-performance computing (HPC) applications. As a result, employing actual scientific software packages provides a more ...

Added: May 19, 2026

Hardware-Software Complex for Network-on-Chip Prototyping Using Multiple FPGAs

Mikhail Y. Romashikhin, Aleksandr Y. Romanov, IEEE Access 2026 Vol. 14 P. 7921–7931

This paper presents a hardware-software multi-FPGA complex designed for hardware prototyping of networks-on-chip (NoCs). The rationale for the use of multiple FPGAs for NoC prototyping is given. The architecture of the complex and its components–the software part generating top-level files and configuration files describing the NoC for several FPGAs, hardware part consisting of interfacing switches ...

Added: January 22, 2026

GEMM Algorithm for Multi-GPU Platforms with Regular Uneven Data Transfer Links

Choi Y. R., Malkovsky S., Stegailov V., , in: 11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers.: Springer, 2026. Ch. 3 P. 32–47.

Multi-GPU servers often exhibit uneven characteristics. For instance, the data transfer bandwidth between four NVIDIA V100 GPUs can vary due to the NVLink connecting these devices to a specific CPU in servers with IBM POWER 9 processors, which means that the communication bandwidth between other devices is comparably slower. To address this issue, the Multi-GPU ...

Added: January 3, 2026

«Cтройка» – компьютерная игра для знакомства с параллельным программированием

Воронова К. Д., Plaksin M. A., В кн.: Актуальные проблемы математики, механики и информатики 2022: Сборник статей по материалам студенческой конференции (г. Пермь, ПГНИУ, 25 мая – 10 июня 2022 г.).: Пермь: ПГНИУ, 2022. С. 25–29.

The rapid development of parallel computing technologies makes it urgent to include propaedeutics of parallel computing in the school computer science course. Since this topic is not yet included in the school curriculum, it can be done through extracurricular activities, in particular, through Internet contests. Since 2013, parallel computing tasks have become a compulsory part ...

Added: February 29, 2024

Methods for Changing Parallelism in the Process of High-Level VLSI Synthesis

Ryzhenko I. N., Nepomnyaschy O. V., A. I. Legalov et al., Automatic Control and Computer Sciences 2023 Vol. 57 No. 7 P. 696–705

In this paper, methods for increasing the efficiency of VLSI development based on the method of architecture-independent design are proposed. The route of high-level VLSI synthesis is considered. The principle of constructing a VLSI hardware model based on the functional-flow programming paradigm is stated. The results of the development of methods and algorithms for the ...

Added: February 27, 2024

GPU-Accelerated Matrix Exponent for Solving 1D Time-Dependent Schrödinger Equation

Choi Y. R., Stegailov V., , in: Supercomputing: 9th Russian Supercomputing Days, RuSCDays 2023, Moscow, Russia, September 25–26, 2023, Revised Selected Papers, Part I.: Springer, 2023. P. 100–113.

Non-adiabatic electron-ion quantum dynamics is still an area of many unresolved problems even for such simple systems as the H2+ molecular ion. Mathematical modelling based on time-dependent Schrödinger equation (TDSE) is an important method that can provide better understanding of these phenomena. In this work, we present TDSE solution for 1D TDSE that describes non-adiabatic electron-ion ...

Added: January 26, 2024

GPU-based molecular dynamics of fluid flows: Reaching for turbulence

Pavlov D., Galigerov V., Kolotinskii D. et al., International Journal of High Performance Computing Applications 2024 Vol. 38 No. 1 P. 34–49

Fluid dynamics is a ubiquitous problem that arises in different branches of science and industry. It is usually tackled by numerically solving differential equations on a finite grid. Molecular dynamics was not a feasible tool to approach fluid dynamics until very recently due to its disproportional computational complexity. In this paper we propose a new ...

Added: July 18, 2023

Multi-GPU GEMM Algorithm Performance Analysis for Nvidia and AMD GPUs Connected by NVLink and PCIe

Choi Y. R., Stegailov V., , in: 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers.: Springer, 2022. Ch. 23 P. 281–292.

Modern types of multi-GPU servers combine up to 8 A100 GPUs connected by NVLink 3.0 links through NVSwitch. This connectivity provides unprecedented capabilities for multi-GPU algorithms. In this work, we analyze the performance of matrix-matrix multiplication algorithm developed by us previously. Tuning principles and limits for maximum performance are discussed. Algorithm performance for much more ...

Added: December 26, 2022

ЗНАКОМСТВО С ПАРАЛЛЕЛЬНЫМИ ВЫЧИСЛЕНИЯМИ В РАМКАХ ДИСТАНЦИОННОГО КОНКУРСА «ТРИЗФОРМАШКА-2022»

Воронова К. Д., Plaksin M. A., В кн.: Дистанционное обучение – образовательная среда XXI века : материалы XII Междунар. науч.-метод. конф. (Республика Беларусь, Минск, 26 мая 2022 года).: Мн.: БГУИР, 2022. С. 163–163.

It is proposed to introduce schoolchildren and students to the basics of parallel computing using the distance competition "TRIZformashka". For the competition "TRIZformashka-2022" the computer game "Builder" was specially developed for teaching how to build parallel algorithms. A description of the game and a download link are given. ...

Added: October 31, 2022

Supercomputing: 7th Russian Supercomputing Days, RuSCDays 2021, Moscow, Russia, September 27–28, 2021, Revised Selected Papers

Springer, 2021.

Added: October 19, 2022

Tuning of a Matrix-Matrix Multiplication Algorithm for Several GPUs Connected by Fast Communication Links

Choi Y. R., Nikolskiy V., Stegailov V., , in: Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers.: Springer, 2022. Ch. 12 P. 158–171.

Added: August 11, 2022

Algorithm for Adaptive Mesh Redistribution in Lattice Boltzmann Simulations

Ziganurova L., Shchur L., Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 2 P. 513–518

The Lattice Boltzmann method (LBM) is the alternative approach for hydrodynamic equation solving. Two factors make it a favorite approach nowadays. Firstly, the attractive feature of LBM is that it is intrinsic for parallel simulations due to the linear structure of the algorithm. Secondly, what makes LBM special for the research, it is well applicable to the simulations ...

Added: May 25, 2022

Методы преобразования параллелизма в процессе высокоуровневого синтеза СБИС

Рыженко И. Н., Непомнящий О. В., Легалов А. И. et al., Моделирование и анализ информационных систем 2022 Т. 29 № 1 С. 60–72

In this paper methods for increasing the efficiency of VLSI development based on the method of architecture-independent design are proposed. The route of high-level VLSI synthesis is considered. The principle of constructing a VLSI hardware model based on the functional-flow programming paradigm is stated. The results of the development of methods and algorithms for transformation functional-parallel ...

Added: March 18, 2022