• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Инструменты анализа и разработки эффективного кода для параллельных архитектур
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Инструменты анализа и разработки эффективного кода для параллельных архитектур

Труды Института системного программирования РАН. 2014. Т. 26. № 1. С. 357–374.
Монаков А. В., Платонов В. А., Avetisyan A.

The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. First, specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed. The first method is loop-based profiling via source-level instrumentation done with Coccinelle tool. The second method is memory reuse distance estimation via virtual memory protection mechanism and manual instrumentation. The third method is cache miss and false sharing estimation by collecting a partial trace of memory accesses using compiler instrumentation and estimating cache behavior in postprocessing based on the trace and a cache model. Second, the problem of automatic parallel code generation for hybrid architectures is discussed. Our approach is to generate OpenCL code from parallel loop nests based on GRAPHITE infrastructure in the GCC compiler. Finally, in cases where achieving high efficiency on hybrid systems requires significant rework of data structures or algorithms, one can employ auto-tuning to specialize for specific input data and hardware at run time. This is demonstrated on the problem of optimizing sparse matrix-vector multiplication for GPUs and its use for accelerating linear system solving in OpenFOAM CFD package. We propose a variant of “sliced ELLPACK” sparse matrix storage format with special treatment for small horizontal or diagonal blocks, where the exact parameters of matrix structure and GPU kernel launch should be automatically tuned at runtime for the specific matrix and GPU hardware.

Language: Russian
Full text
Text on another site
Keywords: CUDAисследование и оптимизация программOpenCLOpenFOAMпрофилированиеразреженные матрицы
Similar publications
GEMM Algorithm for Multi-GPU Platforms with Regular Uneven Data Transfer Links
Choi Y. R., Malkovsky S., Stegailov V., , in: 11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers.: Springer, 2026. Ch. 3 P. 32–47.
Multi-GPU servers often exhibit uneven characteristics. For instance, the data transfer bandwidth between four NVIDIA V100 GPUs can vary due to the NVLink connecting these devices to a specific CPU in servers with IBM POWER 9 processors, which means that the communication bandwidth between other devices is comparably slower. To address this issue, the Multi-GPU ...
Added: January 3, 2026
Проблемы реализации права на свободу слова в эпоху Big Data
Лескина Э. И., Журнал российского права 2025 Т. 29 № 8 С. 50–65
The evolution of the understanding of freedom of speech occurs, among other things, in connection with the development of information and communication technologies, the ways in which people actually exercise freedom of speech. The development of platforms and social networks, in which big data plays a key role, lead us to a new era of ...
Added: September 4, 2025
Большие данные (Big Data) и охрана здоровья: возможности и риски
Лескина Э. И., В кн.: Взаимодействие власти, бизнеса и общества в сохранении и укреплении общественного здоровья.: Саратов: Издательство Саратовского университета, 2024.
Большие данные (Big Data) – комплексное явление. Охрана здоровья имеет двойственные отношения с технологией больших данных. С одной стороны, сведения из системы здравоохранения формируют огромные массивы данных. С другой стороны, аналитика данных прямым образом оказывает влияние на здравоохранение и медицинскую помощь. Рассматриваются направления влияния больших данных на здравоохранение, анализируются действующие стратегии зарубежных государств в области ...
Added: February 17, 2025
Big Data и борьба с терроризмом возможности и перспективы
Лескина Э. И., В кн.: Вызовы информационного общества: тенденции развития правового регулирования цифровых трансформаций: Монография по материалам 3.0 международной научно-практической конференции.: Саратов: ФГБОУ ВПО "Саратовская государственная юридическая академия", 2022. С. 81–88.
Over the past three years, more than six thousand crimes of a terrorist nature have been committed in the Russian Federation. Many of these crimes have digital traces, according to which these acts can be prevented or revealed. Universal digitalization, the development of the information society, the active use of information technologies by the public ...
Added: October 2, 2024
Multi-GPU GEMM Algorithm Performance Analysis for Nvidia and AMD GPUs Connected by NVLink and PCIe
Choi Y. R., Stegailov V., , in: 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers.: Springer, 2022. Ch. 23 P. 281–292.
Modern types of multi-GPU servers combine up to 8 A100 GPUs connected by NVLink 3.0 links through NVSwitch. This connectivity provides unprecedented capabilities for multi-GPU algorithms. In this work, we analyze the performance of matrix-matrix multiplication algorithm developed by us previously. Tuning principles and limits for maximum performance are discussed. Algorithm performance for much more ...
Added: December 26, 2022
Tuning of a Matrix-Matrix Multiplication Algorithm for Several GPUs Connected by Fast Communication Links
Choi Y. R., Nikolskiy V., Stegailov V., , in: Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers.: Springer, 2022. Ch. 12 P. 158–171.
Added: August 11, 2022
Маньяк из соседнего канона: как наука нормализует культовое зло
Мария Марей, Философия. Журнал Высшей школы экономики 2022 Т. 6 № 2 С. 148–167
This article is devoted to studying cinematic images of serial criminals in a series of relevant topics: those where scientific and quasi-scientific methods, which are in Russian called “profiling”, are used to calculate and catch them. Assuming that cinema and television can change (and shape) a person's ideas about life, norms, about right or wrong, ...
Added: July 2, 2022
Algorithm for Adaptive Mesh Redistribution in Lattice Boltzmann Simulations
Ziganurova L., Shchur L., Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 2 P. 513–518
The Lattice Boltzmann method (LBM) is the alternative approach for hydrodynamic equation solving. Two factors make it a favorite approach nowadays. Firstly, the attractive feature of LBM is that it is intrinsic for parallel simulations due to the linear structure of the algorithm. Secondly, what makes LBM special for the research, it is well applicable to the simulations ...
Added: May 25, 2022
GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP
Kondratyuk N., Nikolskiy V., Pavlov D. et al., International Journal of High Performance Computing Applications 2021 Vol. 35 No. 4 P. 312–324
Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management ...
Added: June 25, 2021
Исследование применимости методов хранения разреженных матриц в задаче расчета переходных токов по конструкции космических аппаратов
Баринова С. А., Спирин Д. А., Vostrikov A. V. et al., Системный администратор 2021 № 5(222) С. 79–81
This work is devoted to the analysis of the applicability of storage methods for large sparse matrices for calculating electrical circuits. Conclusions were made about the condition of applicability of explicit Runge-Kutta methods. The result of the study can be integrated into the existing educational environment and be an auxiliary link in the procedure for ...
Added: May 19, 2021
Algorithm for replica redistribution in an implementation of the population annealing method on a hybrid supercomputer architecture
Russkov A., Chulkevich R., Shchur L., Computer Physics Communications 2021 Vol. 261 P. 107786
The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. ...
Added: December 28, 2020
Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink
Choi Y. R., Nikolskiy V., Stegailov V., , in: 2020 Global Smart Industry Conference (GloSIC).: IEEE, 2020. P. 354–361.
Added: December 3, 2020
Evaluating OpenMP, OpenACC and CUDA parallel programming models for the GPU: Performance Analysis
Timofeev A., Khalilov M., , in: Параллельные вычислительные технологии (ПаВТ'2020).: Chelyabinsk: ., 2020. P. 40–51.
Modern supercomputers use GPUs as accelerators in computing nodes. GPUs allow scientific applications to greatly boost performance using fine-grained parallelism. CUDA programming model oriented to take advantage of the SIMT GPU architecture writing low-level code. Contrary to this approach, OpenACC and OpenMP 4.5 represent a declarative model of parallel programming using compiler pragmas with support ...
Added: October 23, 2020
Performance and portability of state-of-art molecular dynamics software on modern GPUs
Kuznetsov E., Kondratyuk N., Logunov M. et al., , in: PPAM 2019: Parallel Processing and Applied Mathematics. Lecture Notes in Computer ScienceVol. 12043: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I.: Springer, 2020. P. 324–334.
Classical molecular dynamics (MD) calculations represent a significant part of utilization time of high performance computing systems. As usual, efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed MD packages focused on GPUs differ both in their data management capabilities and ...
Added: October 14, 2020
Algorithm for the replica redistribution in the implementation of parallel annealing method on the hybrid supercomputer architecture
Russkov A., Roman Chulkevich, Shchur L., / Series arXiv "math". 2020. No. 2006.00561.
The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and ...
Added: June 2, 2020
Производительность современных вычислительных платформ в расчетах молекулярной динамики белок - мембранных систем
Nolde D., Krylov N., Телегин П. Н. et al., Труды НИИСИ РАН 2018 Т. 7 № 4 С. 157–161
The performance of molecular dynamics software package Gromacs was measured on various hardware: desktop computers, clusters based on x84_64 processors or many integrated core processors, and heterogeneous system with gaming graphic cards or general purpose GPU systems. The optimal choice of hardware for molecular dynamics simulations is discussed. ...
Added: February 10, 2020
Implementation of an XSL block cipher with MDS-matrix linear transformation on NVIDIA CUDA
Fomin D., Математические вопросы криптографии 2015 Vol. 6 No. 2 P. 99–108
In this article we consider NVIDIA GPU implementation aspects of an XSL block cipher over the finite field with MDS-matrix linear transformation. We compare obtained results with some other block ciphers. ...
Added: May 4, 2019
A timing attack on CUDA implementations of an AES-type block cipher
Fomin D., Математические вопросы криптографии 2016 Vol. 7 No. 2 P. 121–130
A timing attack against an AES-type block cipher CUDA implementa- tion is presented. Our experiments show that it is possible to extract a secret AES 128-bit key with complexity of 2^32 chosen plaintext encryptions. This approach may be applied to AES with other key sizes and, moreover, to any block cipher with a linear transform that is ...
Added: May 4, 2019
Профилирование GATE Developer для выявления причины переполнения памяти
Макаров В. В., Lanin V., В кн.: Математика программных систем: межвуз. сб. науч. тр.Вып. 15.: Пермь: Пермский государственный национальный исследовательский университет, 2018. С. 44–49.
The article is prepared on the results of the British national corpus processing (BNC, British National Corpus) in the linguistic research system GATE Developer. The authors faced the problem of reduced performance as a result of incorrect distribution of RAM by the system. The paper investigates the problem of memory overflow, identifies possible causes of ...
Added: January 18, 2019
Оптимизация динамической загрузки библиотек на архитектуре ARM
Kudryashov E., Мельник Д. М., Монаков А. В., Труды Института системного программирования РАН 2016 Т. 28 № 1 С. 63–80
The paper discusses an optimization approach for external calls in positionindependent code that is based on loading the callee address immediately at the call site from the Global Offset Table (GOT), avoiding the use of the Procedure Linkage Table (PLT). Normally the Linux toolchain creates the PLT both in the main executable (which comprises position-dependent ...
Added: November 5, 2018
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit