?
Evaluating OpenMP, OpenACC and CUDA parallel programming models for the GPU: Performance Analysis
P. 40-51.
Timofeev A., Khalilov M.
Modern supercomputers use GPUs as accelerators in computing nodes. GPUs allow scientific applications to greatly boost performance using fine-grained parallelism. CUDA programming model oriented to take advantage of the SIMT GPU architecture writing low-level code. Contrary to this approach, OpenACC and OpenMP 4.5 represent a declarative model of parallel programming using compiler pragmas with support of GPU offloading. In this paper the efficiency of matrix multiplication using these programming models is considered. A comparative analysis of the performance of naive and hand tuned matrix multiplication on Nvidia Tesla V100 and MX940 GPUs and modern CPUs is carried out. Analysis of vendor-optimized BLAS libraries is also present.
Fomin D., Математические вопросы криптографии 2016 Vol. 7 No. 2 P. 121-130
A timing attack against an AES-type block cipher CUDA implementa- tion is presented. Our experiments show that it is possible to extract a secret AES 128-bit key with complexity of 2^32 chosen plaintext encryptions. This approach may be applied to AES with other key sizes and, moreover, to any block cipher with a linear transform that is ...
Added: May 4, 2019
Lebedev P. A., Математические вопросы криптографии 2013 Vol. 4 No. 2 P. 73-80
We present optimization guidelines and implementations of cryptographic hash functions GOST R 34.11-94 and GOST R 34.11-2012. Results for x86_64 CPUs and NVIDIA CUDA-capable GPUs are provided for our and several other well-known implementations. It is shown that the new standard may be twice as fast as the old one on modern CPUs, but it ...
Added: April 1, 2013
Gostev I. M., В кн. : Распределенные вычисления и ГРИД-технологии в науке и образовании. Труды 5-й международной конференции Дубна, 16-21 июля 2012 г. : Дубна : Объединенный институт ядерных исследований, 2012. С. 274-279.
Решение задач по обработке изображений и распознаванию графических образов обычно
опирается на некоторою технологию, заключающую в себя последовательность некоторых
операций.В работе исследовано затрачиваемое на обработку время, которое зависит от их количества и трудоемкости, размеров входного изображения и скорости передачи информации между отдельными этапами обработки. ...
Added: July 19, 2013
Springer, 2020
This volume comprises the proceedings of the 13th International Conference on Parallel Processing and Applied Mathematics (PPAM 2019), which was held inBiałystok, Poland, September 8–11, 2019. It was organized by the Department of Computer and Information Science of the Częstochowa University of Technology together with Białystok University of Technology, under the patronage of the Committee ...
Added: October 14, 2020
Oleg E. Bukharov, Dmitry P. Bogolyubov, Expert Systems with Applications 2015 Vol. 42 No. 15-16 P. 6177-6183
Given ever increasing information volume and complexity of engineering, social and economic systems, it has become more difficult to assess incoming data and manage such systems properly. Currently developed innovative decision support systems (DSS) aim to achieve optimum results while minimizing the risks of serious losses. The purpose of the DSS is to help the ...
Added: May 17, 2015
Choi Y. R., Nikolskiy V., Stegailov V., , in : 2020 Global Smart Industry Conference (GloSIC). : IEEE, 2020. P. 354-361.
Added: December 3, 2020
Choi Y. R., Nikolskiy V., Stegailov V., , in : Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers. : Springer, 2022. Ch. 12. P. 158-171.
Added: August 11, 2022
Russkov A., Roman Chulkevich, Shchur L., / Cornell University. Series arXiv "math". 2020. No. 2006.00561.
The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and ...
Added: June 2, 2020
Choi Y. R., Stegailov V., , in : 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers. : Springer, 2022. Ch. 23. P. 281-292.
Modern types of multi-GPU servers combine up to 8 A100 GPUs connected by NVLink 3.0 links through NVSwitch. This connectivity provides unprecedented capabilities for multi-GPU algorithms. In this work, we analyze the performance of matrix-matrix multiplication algorithm developed by us previously. Tuning principles and limits for maximum performance are discussed. Algorithm performance for much more ...
Added: December 26, 2022
Bukharov O., Bogolyubov D., Системный администратор 2014 № 9 С. 88-92
This paper describes aspects of development of decision support system based on neural networks and a genetic algorithm. We justify the use of general-purpose computing on graphics processing units (GPGPU) for our decision support system. Example of CUDA successful application to increase computing performance of the system in question is presented. ...
Added: September 12, 2014
Lebedev P. A., Journal of Physics: Conference Series 2016 Vol. 681 No. 1 P. 012048-1-012048-6
We present results on integration of two major GPGPU APIs with reactor-based event processing model in C++ that utilizes coroutines. With current lack of universally usable GPGPU programming interface that gives optimal performance and debates about the style of implementing asynchronous computing in C++, we present a working implementation that allows a uniform and seamless ...
Added: February 3, 2016
Бараш Л. Ю., Shchur L., Cuda Альманах 2014 № 3 С. 17-17
Libraries RNGSSELIB и PRAND for the parallel generation of pseudo-random numbers in Monte Carlo simulations was developed. RNGSSELIB library contains realization based on the SSE extensionin the modern CPU, and PRAND library contains the generators using CUDA version 5.0 and later. ...
Added: March 10, 2016
Монаков А. В., Платонов В. А., Avetisyan A., Труды Института системного программирования РАН 2014 Т. 26 № 1 С. 357-374
The article proposes methods for supporting development of efficient programs for modern parallel architectures, including hybrid systems. First, specialized profiling methods designed for programmers tasked with parallelizing existing code are proposed. The first method is loop-based profiling via source-level instrumentation done with Coccinelle tool. The second method is memory reuse distance estimation via virtual memory ...
Added: March 22, 2017
Fomin D., Математические вопросы криптографии 2015 Vol. 6 No. 2 P. 99-108
In this article we consider NVIDIA GPU implementation aspects of an XSL block cipher over the finite field with MDS-matrix linear transformation. We compare obtained results with some other block ciphers. ...
Added: May 4, 2019
Lebedev P. A., Вестник Московского государственного технического университета им. Н.Э. Баумана. Серия Естественные науки 2013 № 1 (48) С. 50-60
An approach is described to implementation of the Method of Four Russians for reducing the dense matrices over GF(2) to row echelon form using the NVIDIA CUDA platform. Estimates of the algorithm running time and recommendations on choosing the algorithm parameters are given. It is shown that the developed implementation is most effective in comparison ...
Added: April 1, 2013
Bukharov O., Mizikin A. A., Bogolyubov D., Промышленные АСУ и контроллеры 2013 № 7 С. 37-45
In this article we ground some advantages of the evolutionary approach to the solution of problems of decision support system development. The most popular methods of forecasting and detection of dependences are considered. Advantages of use of neural networks to forecast and to determine of dependences between parameters of systems are given. Advantages of interval ...
Added: November 29, 2013
Nolde D., Krylov N., Телегин П. Н. et al., Труды НИИСИ РАН 2018 Т. 7 № 4 С. 157-161
The performance of molecular dynamics software package Gromacs was measured on various
hardware: desktop computers, clusters based on x84_64 processors or many integrated core processors, and
heterogeneous system with gaming graphic cards or general purpose GPU systems. The optimal choice of hardware
for molecular dynamics simulations is discussed. ...
Added: February 10, 2020
Kondratyuk N., Nikolskiy V., Pavlov D. et al., International Journal of High Performance Computing Applications 2021 Vol. 35 No. 4 P. 312-324
Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management ...
Added: June 25, 2021
Russkov A., Chulkevich R., Shchur L., Computer Physics Communications 2021 Vol. 261 P. 107786
The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. ...
Added: December 28, 2020
Замятина Елена Борисовна, Ханжина Н. Е., В кн. : Высокопроизводительные вычисления на графических процессорах: материалы III Всерос. науч.-практ. конф. с междунар. участием с элементами науч. шк. для молодежи (ВВГП–2016). : Пермь : Пермский государственный национальный исследовательский университет, 2016. С. 70-81.
In this work, we describe the problem of automated pollen recognition using images from lighting microscope. Automated pollen recognition related to such important tasks as honey quality control, air quality control for helping to asthma and allergy patients, paleopalynology, forensic palynology. We describe the problem solution based on machine learning and CUDA. Extracted features and ...
Added: March 12, 2017
Ziganurova L., Shchur L., Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 2 P. 513-518
The Lattice Boltzmann method (LBM) is the alternative approach for hydrodynamic
equation solving. Two factors make it a favorite approach nowadays. Firstly, the attractive feature
of LBM is that it is intrinsic for parallel simulations due to the linear structure of the algorithm.
Secondly, what makes LBM special for the research, it is well applicable to the simulations ...
Added: May 25, 2022
Gostev I. M., Sibirtseva E. A., RUDN Journal of Mathematics, Information Sciences and Physics 2014 No. 4 P. 68-84
Low-cost gaze tracking systems are in great demand due to their wide range of application. Commonly, extra devices are needed (for instance, head mounted cameras); however, in this investigation gaze tracking is performed in real-time based on the video stream from an infrared video camera. A comparative analysis of the existing analogues was executed and ...
Added: December 7, 2014
El-Fakih K., Barlas G., Ali M. et al., International Journal of Parallel, Emergent and Distributed Systems 2018 Vol. 33 No. 2 P. 197-210
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the derivation of input sequences that can distinguish states of an FSM specification, aka distinguishing sequences. A major effort in the construction of these sequences ...
Added: October 31, 2018
Kuznetsov E., Kondratyuk N., Logunov M. et al., , in : PPAM 2019: Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science. Vol. 12043: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I.: Springer, 2020. P. 324-334.
Classical molecular dynamics (MD) calculations represent a significant part of utilization time of high performance computing systems. As usual, efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed MD packages focused on GPUs differ both in their data management capabilities and ...
Added: October 14, 2020