?
Matrix-Matrix Multiplication Using Multiple GPUs Connected by Nvlink
P. 354-361.
Choi Y. R., Stegailov V., , in : 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers. : Springer, 2022. Ch. 23. P. 281-292.
Modern types of multi-GPU servers combine up to 8 A100 GPUs connected by NVLink 3.0 links through NVSwitch. This connectivity provides unprecedented capabilities for multi-GPU algorithms. In this work, we analyze the performance of matrix-matrix multiplication algorithm developed by us previously. Tuning principles and limits for maximum performance are discussed. Algorithm performance for much more ...
Added: December 26, 2022
Choi Y. R., Nikolskiy V., Stegailov V., , in : Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers. : Springer, 2022. Ch. 12. P. 158-171.
Added: August 11, 2022
Bogolyubov D., Чанкин А. А., Стемиковская К. В., Промышленные АСУ и контроллеры 2012 № 10 С. 30-35
In this article we introduce a CUDA-based implementation of Kohonen self-organizing map. We describe software implementation and test results confirming performance growth with increasing size of neural network comparative to serial version of algorithm. ...
Added: February 13, 2013
Gostev I. M., Sibirtseva E. A., RUDN Journal of Mathematics, Information Sciences and Physics 2014 No. 4 P. 68-84
Low-cost gaze tracking systems are in great demand due to their wide range of application. Commonly, extra devices are needed (for instance, head mounted cameras); however, in this investigation gaze tracking is performed in real-time based on the video stream from an infrared video camera. A comparative analysis of the existing analogues was executed and ...
Added: December 7, 2014
Bukharov O., Bogolyubov D., Системный администратор 2014 № 9 С. 88-92
This paper describes aspects of development of decision support system based on neural networks and a genetic algorithm. We justify the use of general-purpose computing on graphics processing units (GPGPU) for our decision support system. Example of CUDA successful application to increase computing performance of the system in question is presented. ...
Added: September 12, 2014
Якушев В. Л., Симбиркин В. Н., Филимонов А. В. et al., Вестник Нижегородского университета им. Н.И. Лобачевского 2012 № 4(1) С. 238-246
The results of numerical experiments using the iterative methods are presented for the solution of ill-conditioned symmetric SLAE for a number of structural mechanics problems. The second order incomplete triangular factorization has been used to construct preconditionings. MPI and TBB technologies have been exploited to parallelize computations. Numerical experiments in different parallel regimes have been ...
Added: February 17, 2017
Barash L., Guskova M. S., Shchur L., Programming and Computer Software 2017 Vol. 43 No. 3 P. 145-160
By the example of the RNGAVXLIB random number generator library, this paper considers some approaches to employing AVX vectorization for calculation speedup. The RNGAVXLIB library contains AVX implementations of modern generators and the routines allowing one to initialize up to 10^19 independent ran-dom number streams. The AVX implementations yield exactly the same pseudorandom sequences as ...
Added: March 24, 2017
Lebedev P. A., Вестник Московского государственного технического университета им. Н.Э. Баумана. Серия Естественные науки 2013 № 1 (48) С. 50-60
An approach is described to implementation of the Method of Four Russians for reducing the dense matrices over GF(2) to row echelon form using the NVIDIA CUDA platform. Estimates of the algorithm running time and recommendations on choosing the algorithm parameters are given. It is shown that the developed implementation is most effective in comparison ...
Added: April 1, 2013
Leokhin, Y., Myagkov, A., Panfilov, P., , in : 26th DAAAM International Symposium on Intelligent Manufacturing and Automation 2015. Vol. 1.: NY : Curran Associates, Inc., 2015. P. 0656 - 0662.
In this paper, we present results of a computational evaluation of goMapReduce parallel programming model approach for solving distributed data processing problems. In some applications, particularly data center problems, including text processing the programming models can aggregate significant number of parallel processes. We first discuss the implementation of these approaches using both Linux and Plan9 ...
Added: November 26, 2016
Cham : Springer, 2018
This book constitutes the refereed proceedings of the 12th International Conference on Parallel Computational Technologies, PCT 2018, held in Rostov-on-Don, Russia, in April 2018.
The 24 revised full papers presented were carefully reviewed and selected from 167 submissions. The papers are organized in topical sections on high performance architectures, tools and technologies; parallel numerical algorithms; supercomputer simulation. ...
Added: March 11, 2019
Salibekyan, S., Panfilov, P., Procedia Engineering 2015 Vol. 100C P. 977-986
Historically, a typical embedded system has been designed as a control-dominated system using only a state-oriented model, such as FSMs. However, the trend in embedded systems design in recent years has been towards highly distributed architectures with support for concurrency, data and control flow, and scalable distributed computations. This implies that a different approach is ...
Added: December 28, 2014
Kondratyuk N., Nikolskiy V., Pavlov D. et al., International Journal of High Performance Computing Applications 2021 Vol. 35 No. 4 P. 312-324
Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management ...
Added: June 25, 2021
Russkov A., Roman Chulkevich, Shchur L., / Cornell University. Series arXiv "math". 2020. No. 2006.00561.
The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and ...
Added: June 2, 2020
Sukhoroslov O. V., Journal of Parallel and Distributed Computing 2018 Vol. 118 No. 1 P. 177-188
The paper presents an approach to the design and implementation of web-based environments for practical exercises in parallel and distributed computing (PDC). The presented approach introduces minimal development and operational costs by relying on Everest, a general-purpose platform for building computational web services. The flexibility of proposed service-oriented architecture enables the development of different types ...
Added: August 27, 2018
Timofeev A., Khalilov M., , in : Параллельные вычислительные технологии (ПаВТ'2020). : Chelyabinsk : ., 2020. P. 40-51.
Modern supercomputers use GPUs as accelerators in computing nodes. GPUs allow scientific applications to greatly boost performance using fine-grained parallelism. CUDA programming model oriented to take advantage of the SIMT GPU architecture writing low-level code. Contrary to this approach, OpenACC and OpenMP 4.5 represent a declarative model of parallel programming using compiler pragmas with support ...
Added: October 23, 2020
Gostev I. M., В кн. : Распределенные вычисления и ГРИД-технологии в науке и образовании. Труды 5-й международной конференции Дубна, 16-21 июля 2012 г. : Дубна : Объединенный институт ядерных исследований, 2012. С. 274-279.
Решение задач по обработке изображений и распознаванию графических образов обычно
опирается на некоторою технологию, заключающую в себя последовательность некоторых
операций.В работе исследовано затрачиваемое на обработку время, которое зависит от их количества и трудоемкости, размеров входного изображения и скорости передачи информации между отдельными этапами обработки. ...
Added: July 19, 2013
Sidorenko V., Петров А. С., Информатизация образования и науки 2018 № 2(38) С. 51-61
Logistics processes planning (the planned schedule creation) can be considered as the task of directed enumeration, which is solved taking into
account the restrictions imposed on the functioning of the logistics system. Its
solution requires considerable time. To accelerate the receipt of the results of consideration of various options of the planned schedule, it is proposed to ...
Added: June 26, 2018
Furmanov K. K., Никольский И. М., Прикладная математика и информатика 2015 Т. 49 С. 71-79
Рассматривается задача поиска точек сдвига матожидания временного ряда большой длины. Предполагается, что длина ряда велика (от миллиона элементов), и его анализ будет производиться на суперкомпьютере, в связи с чем возникает необходимость разработки соответствующего параллельного алгоритма. Предложен легко параллелизуемый метод обнаружения сдвигов среднего. Основная его идея - разбиение ряда на сегменты небольшой длины. Вычислителные эксперименты показали хорошую ...
Added: December 18, 2015
IOS Press, 2020
The year 2019 marked four decades of cluster computing, a history that began in 1979 when the first cluster systems using Components Off The Shelf (COTS) became operational. This achievement resulted in a rapidly growing interest in affordable parallel computing for solving compute intensive and large scale problems. It also directly lead to the founding ...
Added: March 27, 2020
Кучев А. Д., Plaksin M. A., Информатика в школе 2016 Т. 122 № 9 С. 42-48
There is the description of the computer game, designed for primary acquaintance with parallel programming. Hyperlink to download the program and several game tasks is given. ...
Added: January 30, 2017
Bukharov O., Mizikin A. A., Bogolyubov D., Промышленные АСУ и контроллеры 2013 № 7 С. 37-45
In this article we ground some advantages of the evolutionary approach to the solution of problems of decision support system development. The most popular methods of forecasting and detection of dependences are considered. Advantages of use of neural networks to forecast and to determine of dependences between parameters of systems are given. Advantages of interval ...
Added: November 29, 2013
Plaksin M. A., Информатика в школе 2017 № 4 С. 25-39
There is the description of the conception "resource allocation". Increase in speed as a result of parallelization of work is demonstrated. As an example, the investigation of task from the contest "TRIZformashka 2015" is given. ...
Added: October 22, 2017
Choi Y. R., Stegailov V., , in : Supercomputing: 9th Russian Supercomputing Days, RuSCDays 2023, Moscow, Russia, September 25–26, 2023, Revised Selected Papers, Part I. : Springer, 2023. P. 100-113.
Non-adiabatic electron-ion quantum dynamics is still an area of many unresolved problems even for such simple systems as the H2+ molecular ion. Mathematical modelling based on time-dependent Schrödinger equation (TDSE) is an important method that can provide better understanding of these phenomena. In this work, we present TDSE solution for 1D TDSE that describes non-adiabatic electron-ion ...
Added: January 26, 2024
Bakanov V. M., М. : Издательство Московского государственного университета приборостроения и информатики, 2014
The manual sets out the requirements of science and industry, leading to use of multicomputer systems and multiprocessor systems, which inevitably use the principle of parallel computing, background and state of the art, describes the main approaches to the organization of multiprocessor computer systems, development of parallel algorithms for the numerical solution of problems and ...
Added: February 3, 2015