Integrating GPGPU computations with CPU coroutines in C++
Journal of Physics: Conference Series. 2016. Vol. 681. No. 1. P. 012048-1-012048-6.
We present results on integration of two major GPGPU APIs with reactor-based event processing model in C++ that utilizes coroutines. With current lack of universally usable GPGPU programming interface that gives optimal performance and debates about the style of implementing asynchronous computing in C++, we present a working implementation that allows a uniform and seamless approach to writing C++ code with continuations that allow processing on CPUs or CUDA/OpenCL accelerators. Performance results are provided that show, if corner cases are avoided, this approach has negligible performance cost on latency.
Research target: Computer Science
Priority areas: IT and mathematics
, , Expert Systems with Applications 2015 Vol. 42 No. 15-16 P. 6177-6183
Given ever increasing information volume and complexity of engineering, social and economic systems, it has become more difficult to assess incoming data and manage such systems properly. Currently developed innovative decision support systems (DSS) aim to achieve optimum results while minimizing the risks of serious losses. The purpose of the DSS is to help the ...
Added: May 17, 2015
Comparison of old and new cryptographic hash function standards of the Russian Federation on CPUs and NVIDIA GPUs
, Математические вопросы криптографии 2013 Vol. 4 No. 2 P. 73-80
We present optimization guidelines and implementations of cryptographic hash functions GOST R 34.11-94 and GOST R 34.11-2012. Results for x86_64 CPUs and NVIDIA CUDA-capable GPUs are provided for our and several other well-known implementations. It is shown that the new standard may be twice as fast as the old one on modern CPUs, but it ...
Added: April 1, 2013
, Вестник Московского государственного технического университета им. Н.Э. Баумана. Серия Естественные науки 2013 № 1 (48) С. 50-60
An approach is described to implementation of the Method of Four Russians for reducing the dense matrices over GF(2) to row echelon form using the NVIDIA CUDA platform. Estimates of the algorithm running time and recommendations on choosing the algorithm parameters are given. It is shown that the developed implementation is most effective in comparison ...
Added: April 1, 2013
Библиотека PRAND: генерация параллельных потоков случайных чисел для расчетов Монте-Карло с использованием GPU
, , Cuda Альманах 2014 № 3 С. 17-17
Libraries RNGSSELIB и PRAND for the parallel generation of pseudo-random numbers in Monte Carlo simulations was developed. RNGSSELIB library contains realization based on the SSE extensionin the modern CPU, and PRAND library contains the generators using CUDA version 5.0 and later. ...
Added: March 10, 2016
Распараллеленная самообучающаяся система поддержки принятия решений на генетических алгоритмах и нейронных сетях
, , Системный администратор 2014 № 9 С. 88-92
This paper describes aspects of development of decision support system based on neural networks and a genetic algorithm. We justify the use of general-purpose computing on graphics processing units (GPGPU) for our decision support system. Example of CUDA successful application to increase computing performance of the system in question is presented. ...
Added: September 12, 2014
, , , Промышленные АСУ и контроллеры 2013 № 7 С. 37-45
In this article we ground some advantages of the evolutionary approach to the solution of problems of decision support system development. The most popular methods of forecasting and detection of dependences are considered. Advantages of use of neural networks to forecast and to determine of dependences between parameters of systems are given. Advantages of interval ...
Added: November 29, 2013
, Математические вопросы криптографии 2015 Vol. 6 No. 2 P. 99-108
In this article we consider NVIDIA GPU implementation aspects of an XSL block cipher over the finite field with MDS-matrix linear transformation. We compare obtained results with some other block ciphers. ...
Added: May 4, 2019
GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP
, , et al., International Journal of High Performance Computing Applications 2021 Vol. 35 No. 4 P. 312-324
Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management ...
Added: June 25, 2021
, , , Промышленные АСУ и контроллеры 2012 № 10 С. 30-35
In this article we introduce a CUDA-based implementation of Kohonen self-organizing map. We describe software implementation and test results confirming performance growth with increasing size of neural network comparative to serial version of algorithm. ...
Added: February 13, 2013
Algorithm for the replica redistribution in the implementation of parallel annealing method on the hybrid supercomputer architecture
, , , Algorithm for the replica redistribution in the implementation of parallel annealing method on the hybrid supercomputer architecture / Cornell University. Series arXiv "math". 2020. No. 2006.00561.
The parallel annealing method is one of the promising approaches for large scale simulations as potentially scalable on any parallel architecture. We present an implementation of the algorithm on the hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible redistributing replicas and ...
Added: June 2, 2020
Parallel algorithms for reducing derivation time of distinguishing experiments for nondeterministic finite state machines
, , et al., International Journal of Parallel, Emergent and Distributed Systems 2018 Vol. 33 No. 2 P. 197-210
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the derivation of input sequences that can distinguish states of an FSM specification, aka distinguishing sequences. A major effort in the construction of these sequences ...
Added: October 31, 2018
Algorithm for replica redistribution in an implementation of the population annealing method on a hybrid supercomputer architecture
, , , Computer Physics Communications 2021 Vol. 261 P. 107786
The population annealing method is a promising approach for large-scale simulations because it is potentially scalable on any parallel architecture. We present an implementation of the algorithm on a hybrid program architecture combining CUDA and MPI. The problem is to keep all general-purpose graphics processing unit devices as busy as possible by efficiently redistributing replicas. ...
Added: December 28, 2020
Производительность современных вычислительных платформ в расчетах молекулярной динамики белок - мембранных систем
, , et al., Труды НИИСИ РАН 2018 Т. 7 № 4 С. 157-161
The performance of molecular dynamics software package Gromacs was measured on various hardware: desktop computers, clusters based on x84_64 processors or many integrated core processors, and heterogeneous system with gaming graphic cards or general purpose GPU systems. The optimal choice of hardware for molecular dynamics simulations is discussed. ...
Added: February 10, 2020
, , RUDN Journal of Mathematics, Information Sciences and Physics 2014 No. 4 P. 68-84
Low-cost gaze tracking systems are in great demand due to their wide range of application. Commonly, extra devices are needed (for instance, head mounted cameras); however, in this investigation gaze tracking is performed in real-time based on the video stream from an infrared video camera. A comparative analysis of the existing analogues was executed and ...
Added: December 7, 2014
М. : National Instruments Russia, 2017
Содержание сборника составляют доклады с результатами оригинальных исследований и технических решений, ранее не публиковавшиеся. Мы надеемся, что предлагаемый сборник окажется полезным для специалистов, работающих в различных областях науки и техники, для широкого круга преподавателей, аспирантов и студентов ВУЗов, а также для преподавателей средних школ и технических колледжей. ...
Added: May 10, 2017
, , Procedia Engineering 2015 Vol. 100 P. 1459-1468
Work solutions are proposed for problems of leader definition and role distribution in homogeneous groups of robots. It is shown that transition from a swarm to a collective of robots with hierarchical organization is possible using exclusively local interaction. The local revoting algorithm is central to the procedure for choice of leader while redistribution of roles can ...
Added: March 14, 2015
Algorithms and methods for solving scheduling problems and other extremum problems on large-scale graphs
, , et al., Journal of Mathematical Sciences 2005 Vol. 128 No. 6 P. 3487-3495
Added: January 27, 2014
, , et al., IEEE Transactions on Networking 2018 Vol. 26 No. 1 P. 342-355
Modern network elements are increasingly required to deal with heterogeneous traffic. Recent works consider processing policies for buffers that hold packets with different processing requirements (number of processing cycles needed before a packet can be transmitted out) but uniform value, aiming to maximize the throughput, i.e., the number of transmitted packets. Other developments deal with ...
Added: March 14, 2018
, , , Inorganic Materials: Applied Research 2016 Vol. 7 No. 1 P. 34-39
A database (DB) on the bandgap of inorganic substances available via the Internet (http://bg.imetdb.ru) was developed for the information service of specialists in the sphere of inorganic chemistry and materials science. The DB is integrated with other information systems on the properties of inorganic substances and materials, which provides the search of a wide range ...
Added: February 23, 2016
, , et al., International Journal of Applied Mechanics 2016 Vol. 8 No. 2 P. 1650016-01-1650016-18
We present a method for evaluating elastic properties of a composite material produced by molding a resin filled with short elastic fibers. A flow of the filled resin is simulated numerically using a mesh-free method. After that, assuming that spatial distribution and orientation of fibers are not significantly changed during polymerization, effective elastic moduli of ...
Added: May 22, 2016
The complexity of the 3-colorability problem in the absence of a pair of small forbidden induced subgraphs
, Discrete Mathematics 2015 Vol. 338 No. 11 P. 1860-1865
We completely determine the complexity status of the 3-colorability problem for hereditary graph classes defined by two forbidden induced subgraphs with at most five vertices. ...
Added: April 7, 2014
, , et al., ACM Transactions on Computation Theory 2018 Vol. 10 No. 2 P. 1-32
The H-free Edge Deletion problem asks, for a given graph G and integer k, whether it is possible to delete at most k edges from G to make it H-free—that is, not containing H as an induced subgraph. The H-free Edge Completion problem is defined similarly, but we add edges instead of deleting them. The study of these two problem families has recently been the subject of intensive studies from the point of ...
Added: October 30, 2018
Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.). Вып. 18 (25)
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
Сборник включает 27 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2019», не вошедшие в ежегодник «Компьютерная лингвистика и интеллектуальные технологии», но рекомендованные Программным Комитетом к представлению на конференции. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...
Added: December 10, 2019
Шестая Всероссийская научно-практическая конференция по имитационному моделированию и его применению в науке и промышленности «Имитационное моделирование. Теория и практика» Материалы конференции. Сборник докладов
Каз. : Издательство «Фэн» Академии наук Республики Татарстан, 2013
Материалы и доклады Шестой Всероссийской научно-практической конференции по имитацонному моделированию и его применению в науке и промышленности. ...
Added: December 14, 2013