• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Performance and portability of state-of-art molecular dynamics software on modern GPUs
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Performance and portability of state-of-art molecular dynamics software on modern GPUs

P. 324–334.
Kuznetsov E., Kondratyuk N., Logunov M., Nikolskiy V., Stegailov V.

Classical molecular dynamics (MD) calculations represent a significant part of utilization time of high performance computing systems. As usual, efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed MD packages focused on GPUs differ both in their data management capabilities and in performance. In this paper, we present our results for the porting of the CUDA backend of LAMMPS to ROCm HIP that shows considerable benefits for AMD GPUs comparatively to the existing OpenCL backend. We consider the efficiency of solving the same physical models using different software and hardware combinations. We analyze the performance of LAMMPS, HOOMD, GROMACS and OpenMM MD packages with different GPU back-ends on modern Nvidia Volta and AMD Vega20 GPUs.

Language: English
DOI
Text on another site
Keywords: CUDAOpenCLGromacsAMD ROCm HIPHOOMDOpenMM
Publication based on the results of:
Methods for the analysis of the supercomputer efficiency, novel parallel algorithms for molecular dynamics calculations and modeling of transport processes in liquids and biomembranes (2020)

In book

PPAM 2019: Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science
Vol. 12043: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I. , Springer, 2020.
Similar publications
GEMM Algorithm for Multi-GPU Platforms with Regular Uneven Data Transfer Links
Choi Y. R., Malkovsky S., Stegailov V., , in: 11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers.: Springer, 2026. Ch. 3 P. 32–47.
Multi-GPU servers often exhibit uneven characteristics. For instance, the data transfer bandwidth between four NVIDIA V100 GPUs can vary due to the NVLink connecting these devices to a specific CPU in servers with IBM POWER 9 processors, which means that the communication bandwidth between other devices is comparably slower. To address this issue, the Multi-GPU ...
Added: January 3, 2026
GPU-based molecular dynamics of fluid flows: Reaching for turbulence
Pavlov D., Galigerov V., Kolotinskii D. et al., International Journal of High Performance Computing Applications 2024 Vol. 38 No. 1 P. 34–49
Fluid dynamics is a ubiquitous problem that arises in different branches of science and industry. It is usually tackled by numerically solving differential equations on a finite grid. Molecular dynamics was not a feasible tool to approach fluid dynamics until very recently due to its disproportional computational complexity. In this paper we propose a new ...
Added: July 18, 2023
Multi-GPU GEMM Algorithm Performance Analysis for Nvidia and AMD GPUs Connected by NVLink and PCIe
Choi Y. R., Stegailov V., , in: 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers.: Springer, 2022. Ch. 23 P. 281–292.
Modern types of multi-GPU servers combine up to 8 A100 GPUs connected by NVLink 3.0 links through NVSwitch. This connectivity provides unprecedented capabilities for multi-GPU algorithms. In this work, we analyze the performance of matrix-matrix multiplication algorithm developed by us previously. Tuning principles and limits for maximum performance are discussed. Algorithm performance for much more ...
Added: December 26, 2022
Tuning of a Matrix-Matrix Multiplication Algorithm for Several GPUs Connected by Fast Communication Links
Choi Y. R., Nikolskiy V., Stegailov V., , in: Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers.: Springer, 2022. Ch. 12 P. 158–171.
Added: August 11, 2022
Algorithm for Adaptive Mesh Redistribution in Lattice Boltzmann Simulations
Ziganurova L., Shchur L., Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 2 P. 513–518
The Lattice Boltzmann method (LBM) is the alternative approach for hydrodynamic equation solving. Two factors make it a favorite approach nowadays. Firstly, the attractive feature of LBM is that it is intrinsic for parallel simulations due to the linear structure of the algorithm. Secondly, what makes LBM special for the research, it is well applicable to the simulations ...
Added: May 25, 2022
GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP
Kondratyuk N., Nikolskiy V., Pavlov D. et al., International Journal of High Performance Computing Applications 2021 Vol. 35 No. 4 P. 312–324
Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management ...
Added: June 25, 2021
Algorithm for replica redistribution in an implementation of the population annealing method on a hybrid supercomputer architecture
Russkov A., Chulkevich R., Shchur L., Computer Physics Communications 2021 Vol. 261 P. 107786
The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. ...
Added: December 28, 2020
Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink
Choi Y. R., Nikolskiy V., Stegailov V., , in: 2020 Global Smart Industry Conference (GloSIC).: IEEE, 2020. P. 354–361.
Added: December 3, 2020
Evaluating OpenMP, OpenACC and CUDA parallel programming models for the GPU: Performance Analysis
Timofeev A., Khalilov M., , in: Параллельные вычислительные технологии (ПаВТ'2020).: Chelyabinsk: ., 2020. P. 40–51.
Modern supercomputers use GPUs as accelerators in computing nodes. GPUs allow scientific applications to greatly boost performance using fine-grained parallelism. CUDA programming model oriented to take advantage of the SIMT GPU architecture writing low-level code. Contrary to this approach, OpenACC and OpenMP 4.5 represent a declarative model of parallel programming using compiler pragmas with support ...
Added: October 23, 2020
Algorithm for the replica redistribution in the implementation of parallel annealing method on the hybrid supercomputer architecture
Russkov A., Roman Chulkevich, Shchur L., / Series arXiv "math". 2020. No. 2006.00561.
The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and ...
Added: June 2, 2020
Производительность современных вычислительных платформ в расчетах молекулярной динамики белок - мембранных систем
Nolde D., Krylov N., Телегин П. Н. et al., Труды НИИСИ РАН 2018 Т. 7 № 4 С. 157–161
The performance of molecular dynamics software package Gromacs was measured on various hardware: desktop computers, clusters based on x84_64 processors or many integrated core processors, and heterogeneous system with gaming graphic cards or general purpose GPU systems. The optimal choice of hardware for molecular dynamics simulations is discussed. ...
Added: February 10, 2020
Performance and Scalability of Materials Science and Machine Learning Codes on the State-of-Art Hybrid Supercomputer Architecture
Kondratyuk N., Smirnov G., Agarkov A. et al., , in: Supercomputing. RuSCDays 2019. Communications in Computer and Information ScienceVol. 1129: Supercomputing. RuSCDays 2019.: Springer, 2019. P. 597–609.
8 of top 10 supercomputers of Top500 list published in November 2018 consist of computing nodes with hybrid architectures that require special programming techniques. 5 systems among these are based on Nvidia GPUs. In this paper, we consider the benchmark results of the brand new hybrid supercomputer installed in March 2019 in NRU HSE. This ...
Added: December 11, 2019
Implementation of an XSL block cipher with MDS-matrix linear transformation on NVIDIA CUDA
Fomin D., Математические вопросы криптографии 2015 Vol. 6 No. 2 P. 99–108
In this article we consider NVIDIA GPU implementation aspects of an XSL block cipher over the finite field with MDS-matrix linear transformation. We compare obtained results with some other block ciphers. ...
Added: May 4, 2019
A timing attack on CUDA implementations of an AES-type block cipher
Fomin D., Математические вопросы криптографии 2016 Vol. 7 No. 2 P. 121–130
A timing attack against an AES-type block cipher CUDA implementa- tion is presented. Our experiments show that it is possible to extract a secret AES 128-bit key with complexity of 2^32 chosen plaintext encryptions. This approach may be applied to AES with other key sizes and, moreover, to any block cipher with a linear transform that is ...
Added: May 4, 2019
Parallel algorithms for reducing derivation time of distinguishing experiments for nondeterministic finite state machines
El-Fakih K., Barlas G., Ali M. et al., International Journal of Parallel, Emergent and Distributed Systems 2018 Vol. 33 No. 2 P. 197–210
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the derivation of input sequences that can distinguish states of an FSM specification, aka distinguishing sequences. A major effort in the construction of these sequences ...
Added: October 31, 2018
Инструменты анализа и разработки эффективного кода для параллельных архитектур
Монаков А. В., Платонов В. А., Avetisyan A., Труды Института системного программирования РАН 2014 Т. 26 № 1 С. 357–374
The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. First, specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed. The first method is loop-based profiling via source-level instrumentation done with Coccinelle tool. The second method is memory reuse distance estimation via virtual memory ...
Added: March 22, 2017
Использование технологии CUDA в обучении сверточной нейросети для распознавания пыльцевых зерен
Замятина Елена Борисовна, Ханжина Н. Е., В кн.: Высокопроизводительные вычисления на графических процессорах: материалы III Всерос. науч.-практ. конф. с междунар. участием с элементами науч. шк. для молодежи (ВВГП–2016).: Пермь: Пермский государственный национальный исследовательский университет, 2016. С. 70–81.
In this work, we describe the problem of automated pollen recognition using images from lighting microscope. Automated pollen recognition related to such important tasks as honey quality control, air quality control for helping to asthma and allergy patients, paleopalynology, forensic palynology. We describe the problem solution based on machine learning and CUDA. Extracted features and ...
Added: March 12, 2017
Библиотека PRAND: генерация параллельных потоков случайных чисел для расчетов Монте-Карло с использованием GPU
Бараш Л. Ю., Shchur L., Cuda Альманах 2014 № 3 С. 17–17
Libraries RNGSSELIB и PRAND for the parallel generation of pseudo-random numbers in Monte Carlo simulations was developed. RNGSSELIB library contains realization based on the SSE extensionin the modern CPU, and PRAND library contains the generators using CUDA version 5.0 and later. ...
Added: March 10, 2016
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit