Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect

Agarkov A.; A. Semenov

?

Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect

P. 92–101.

Agarkov A., Semenov A.

In this paper we consider an association problem with constraints for two dynamically enlarging tables. We consider a base full association algorithm and propose a partial association algorithm that improves efficiency of the base algorithm. We implement and evaluate the algorithms in Apache Spark for a particular case on the cluster with Angara interconnect.

Language: English

Full text

Text on another site

Keywords: performance evaluation Apache Spark association problem dynamically enlarging tables Angara interconnect

In book

Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists

Vol. 1990: Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists. , CEUR Workshop Proceedings, 2017.

Performance of Supercomputers Based on Angara Interconnect and Novel AMD CPUs/GPUs

Shamsutdinov A., Khalilov M., Ismagilov T. et al., , in: 22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers.: Springer, 2022. P. 401–416.

A low-latency high bandwidth interconnect that makes a unified system from a collection of nodes is a heart of any modern su- percomputer. At the moment, Infiniband is the main commercially avail- able type of interconnect without any other real competition world-wide. Proprietary interconnects are known to stand behind effcient supercomputer systems. Since 2016, the ...

Added: May 16, 2023

Study of Multi-Link Channel Access Without Simultaneous Transmit and Receive in IEEE 802.11be Networks

Korolev N., Ilya Levitsky, Startsev I. et al., IEEE Access 2022 Vol. 10 P. 126339–126351

Native support for multi-link operation is a key novelty of the future Wi-Fi 7 technology defined by the IEEE 802.11be standard, which is currently under development. With the 6 GHz band recently granted for Wi-Fi operation, the novel multi-link feature enables simultaneous usage of multiple wide channels. Thus, it multiplies the capacity of Wi-Fi networks ...

Added: December 22, 2022

On the benefits of ray-based modeling for analyzing on-body MmWave systems

Ponomarenko-Timofeev A., Galinina O., Andrey Turlikov et al., , in: 31st IEEE Annual International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2020.: IEEE, 2020. Ch. 9217339 P. 1–6.

While optimizing the system-level performance in a network of advanced high-end wearables, millimeter-wave (mmWave) medium access protocols may benefit from leveraging the information on the spatial and temporal dynamics of the radio channel. In this paper, we aim to bridge the existing gap in mmWave on-body propagation studies by analyzing the channel structure based on ...

Added: October 31, 2022

Understanding join strategies in distributed systems

Tyryshkina Y., , in: International Seminar on Electron Devices Design and Production, SED 2021.: [б.и.], 2021.

In this paper, we consider the problem of reducing the cost of computer time by developing and implementing a method for accelerating the operation of connecting distributed data arrays according to a given criterion. The following tasks were solved: a study was conducted on the architecture of distributed data storages and parallel computing algorithms; on ...

Added: June 2, 2022

Knowledge Triangle Targeted Science, Technology and Innovation Policy

Meissner D., Gokhberg L., Kuzminov Y. et al., , in: The Knowledge Triangle. Changing Higher Education and Research Management Paradigms.: Switzerland: Springer, 2021. Ch. 1 P. 3–15.

During the last decade, the concept of the Knowledge Triangle (KT) in the form of change processes that foster greater interaction between education, research and innovation activities has left the academic community and diffused to the higher education and research policy arena. As a result, numerous policy measures have been developed and implemented aiming at ...

Added: January 17, 2022

Enabling the Internet of Things With Wi-Fi Halow—Performance Evaluation of the Restricted Access Window

Khorov E., Krotov A., Lyakhov A. et al., IEEE Access 2019 Vol. 7 P. 127402–127415

IEEE 802.11ah, a new amendment to the Wi-Fi standard, adapts Wi-Fi networks to the emerging Internet of Things (IoT). A key component of .11ah is the Restricted Access Window (RAW), a new channel access mechanism, which reduces contention when even thousands of IoT devices operate in the same area by assigning them different channel times. ...

Added: November 27, 2021

The Knowledge Triangle. Changing Higher Education and Research Management Paradigms

Aanstad S., Benner M., Borlaug S. B. et al., Switzerland: Springer, 2021.

Added: October 4, 2021

Правовые дефекты принципа эффективности использования бюджетных средств

Воронцов О. Г., Журнал российского права 2017 № 2 (242) С. 110–119

The article deals with the problems of implementation of the principle of performance use of budgetary funds. The paper concludes that in the budget law theory there is no common understanding of the category «performance». The author analyzes the legal formulation of the principle of performance use of budgetary funds from the standpoint of external ...

Added: November 17, 2020

Методология оценки программ и практик инициативного бюджетирования в субъектах Российской Федерации

Вагин В. В., Gavrilova V., Финансовая аналитика: проблемы и решения 2017 Т. 10 № 12 С. 1393–1406

Цель статьи — разработка инструментария для проведения оценки программ и практик инициативного бюджетирования. На данном этапе развития механизмов участия граждан в совершенствовании общественной инфраструктуры поселений и решении вопросов местного значения этот вопрос становится все более актуальным. Применены анализ и классификация различных параметров реализации практик инициативного бюджетирования и вызываемых ими социальных и экономических изменений. Предложена логическая модель ...

Added: November 2, 2020

Biased performance evaluation in a model of career concerns: incentives versus ex-post optimality

Stepanov S., Journal of Economic Behavior and Organization 2020 No. 179 P. 589–607

I study a career concerns model in which the principal receives information about the agent’s performance from an intermediary (evaluator). I show that, in general, a biased evaluator is ex-ante optimal for the principal. The ex-ante optimal bias solves the tradeoff between ex-post optimality of the principal’s decisions about the agent and incentive provision. It ...

Added: October 28, 2020

An Algorithm to Satisfy the QoS Requirements in a Heterogeneous LoRaWAN Network

Bankov D., Khorov E., Lyakhov A., , in: 2020 IEEE Symposium on Computers and Communications (ISCC).: IEEE, 2020. P. 1–6.

LoRaWAN is a popular low power wide area network technology widely used in many scenarios, such as environmental monitoring and smart cities. Different applications demand various quality of service (QoS), and their service within a single network requires special solutions for QoS provision. We consider the problem of QoS provision in heterogeneous LoRaWAN networks that ...

Added: October 17, 2020

Association Algorithm for Two Dynamically Enlarging Tables Implemented in Apache Spark

Agarkov A., Semenov A., , in: Proceedings of the 4th GraphHPC conference on large-scale graph processing using HPC systemsVol. Vol-1981.: CEUR Workshop Proceedings, 2017. P. 10–15.

In the paper we consider association problem with constraints for two dynamically enlarging tables. We consider an ordered set of rule groups which determine associations between entries from the first table and the second table. Each entry is associated with other entries from both tables directly or indirectly through the other associations. In the problem ...

Added: October 30, 2019

Early Performance Evaluation of Supervised Graph Anomaly Detection Problem Implemented in Apache Spark

Mazeev A., Semenov A., Dmitry D. et al., , in: Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young ScientistsVol. 1990: Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists.: CEUR Workshop Proceedings, 2017. P. 84–91.

Apache Spark is one of the most popular Big Data frameworks. Performance evaluation of Big Data frameworks is a topic of interest due to the increasing number and importance of data analytics applications within the context of HPC and Big Data convergence. In the paper we present early performance evaluation of a typical supervised graph ...

Added: October 30, 2019

Unsupervised Graph Anomaly Detection Algorithms Implemented in Apache Spark

Semenov A., Mazeev A., Dmitry D. et al., Lobachevskii Journal of Mathematics 2018 Vol. 39 No. 9 P. 1262–1269

The graph anomaly detection problem occurs in many application areas and can be solved by spotting outliers in unstructured collections of multi-dimensional data points, which can be obtained by graph analysis algorithms. We implement the algorithm for the small community analysis and the approximate LOF algorithm based on Locality-Sensitive Hashing, apply the algorithms to a ...

Added: June 10, 2019

Mathematical model of LoRaWAN channel access with capture effect

Bankov D., Khorov E., Lyakhov A., , in: 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).: IEEE, 2017. P. 1–5.

LoRaWAN is a promising low power long range wireless communications technology for the Internet of Things. An important feature of LoRaWAN gateways is related to so-called capture effect: under some conditions the gateway may correctly receive a frame even if it overlaps with other ones. In this paper, we develop a pioneering mathematical model of ...

Added: October 9, 2018

Toward trusted, social-aware D2D connectivity: bridging across the technology and sociality realms

Ometov A., Orsino A., Militano L. et al., IEEE Wireless Communications 2016 Vol. 23 No. 4 P. 103–111

Driven by the unprecedented increase of mobile data traffic, D2D communications technology is rapidly moving into the mainstream of the 5G networking landscape. While D2D connectivity originally emerged as a technology enabler for public safety services, it is likely to remain at the heart of the 5G ecosystem by spawning a wide diversity of proximate ...

Added: March 13, 2018

A novel security-centric framework for D2D connectivity based on spatial and social proximity

Ometov A., Orsino A., Militano L. et al., Computer Networks 2016 No. 107 P. 327–338

Device-to-device (D2D) communication is one of the most promising innovations in the next-generation wireless ecosystem, which improves the degrees of spatial reuse and creates novel social opportunities for users in proximity. As standardization behind network-assisted D2D technology takes shape, it becomes clear that security of direct connectivity is one of the key concerns on the ...

Added: March 13, 2018

Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

Борисенко О. Д., Пастухов Р. К., С.Д. Кузнецов, Труды Института системного программирования РАН 2016 Т. 28 № 6 С. 111–120

Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud environments is Openstack. The main goal of this project is to provide an ability to create virtual ...

Added: January 25, 2018

Реализация сервиса для выполнения Apache Spark задач и создания Apache Spark кластеров на основе Openstack Sahara

S. Kuznetsov, Борисенко О. Д., Алексиянц А. В. et al., Proceedings of the Institute for System Programming of the RAS 2015 Vol. 27 No. 5 P. 35–48

In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work. Also the implemented solution for building Apache Spark clusters and Apache Spark jobs execution in Openstack environment is described. The ...

Added: January 23, 2018

Автоматическое создание виртуальных кластеров Apache Spark в облачной среде Openstack

Kuznetsov S. D., Turdakov D. Y., Борисенко О. Д., Труды Института системного программирования РАН 2014 Т. 26 № 4 С. 33–44

This article is dedicated to automation of cluster creation and management for Apache Spark MapReduce implementation in Openstack environments. As a result of this project open-source (Apache 2.0 license) implementation of toolchain for virtual cluster on-demand creation in Openstack environments was presented. The article contains an overview of existing solutions for clustering automation in cloud ...

Added: November 26, 2017