• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Understanding join strategies in distributed systems
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 2, 2026
HSE Study Reveals Imbalance in the Generative AI Market
Researchers at HSE University analysed how effectively the global generative artificial intelligence market converts investment into real revenue, concluding that AI is currently developing faster than it is paying off. The results have been published in the journal Foresight and STI Governance.
June 2, 2026
Discovering Science through Russian Language: HSE Prep Year Students Present at International Conference in Kazan
On May 23, 2026, the V International Scientific and Practical Conference ‘Discovering the World of Science’ took place in Kazan at the Preparatory Faculty for International Students of Kazan Federal University. Four students of the HSE International Preparatory Year took part in the event: two delivered their presentations in person, while two participated online. Their work was supervised by Acting Director of the International Prep Year Irina Isaeva and lecturer Ekaterina Kozhemyakova.
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Understanding join strategies in distributed systems

.
Tyryshkina Y.

In this paper, we consider the problem of reducing the cost of computer time by developing and implementing a method for accelerating the operation of connecting distributed data arrays according to a given criterion. The following tasks were solved: a study was conducted on the architecture of distributed data storages and parallel computing algorithms; on the basis of these studies, limiting stages have been established that slow down the processing process; a method was developed that excludes the established limiting stages; on the basis of the developed method, an algorithm and a utility were created that expand the functionality of the selected software product; experimental studies have been carried out

Language: English
Keywords: MapReduceApache Spark

In book

International Seminar on Electron Devices Design and Production, SED 2021
Сигов А. С. [б.и.], 2021.
Similar publications
Triclustering in Big Data Setting
Egurnov D., Точилкин Д. С., Ignatov D. I., , in: Complex Data Analytics with Formal Concept Analysis.: Springer, 2022. P. 239–258.
In this paper, we describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages. OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context. We provide time and space complexity of the ...
Added: November 1, 2022
Accelerating join of distributed datasets by a given criterion
Tyryshkina Y., , in: Proceedings of 2022 IEEE Moscow Workshop on Electronic and Networking Technologies (MWENT).: M.: IEEE, 2022.
Added: May 31, 2022
Method for accelerating the operation of joining distributed datasets by a given criterion
Tyryshkina Y., , in: Международная научнопрактическая конференция «Информационные Инновационные Технологии», 2022.: [б.и.], 2022.
In this paper, we consider the problem of reducing the cost of computer time by developing and implementing a method for accelerating the operation of connecting distributed data arrays according to a given criterion. The following tasks were solved: a study was conducted on the architecture of distributed data storages and parallel computing algorithms; on ...
Added: May 31, 2022
Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect
Agarkov A., Semenov A., , in: Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young ScientistsVol. 1990: Proceedings of the 3rd Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists.: CEUR Workshop Proceedings, 2017. P. 92–101.
In this paper we consider an association problem with constraints for two dynamically enlarging tables. We consider a base full association algorithm and propose a partial association algorithm that improves efficiency of the base algorithm. We implement and evaluate the algorithms in Apache Spark for a particular case on the cluster with Angara interconnect. ...
Added: October 30, 2019
Simplified Mapreduce Mechanism for Large Scale Data Processing
Ahmed Munna M. T., International Journal of Engineering and Technology 2018 Vol. 7 No. 8 P. 16–21
MapReduce has become a popular programming model for processing and running large-scale data sets with a parallel, distributed paradigm on a cluster. Hadoop MapReduce is needed especially for large scale data like big data processing. In this paper, we work to modify the Hadoop MapReduce Algorithm and implement it to reduce processing time. ...
Added: October 29, 2019
Распределенные горизонтально масштабируемые решения для управления данными
С.Д. Кузнецов, Посконин А. В., Труды Института системного программирования РАН 2013 Т. 24 С. 327–258
Many modern applications (such as large-scale Web-sites, social networks, research projects, business analytics, etc.) have to deal with very large data volumes (also referred to as “big data”) and high read/write loads. These applications require underlying data management systems to scale well in order to accommodate data growth and increasing workloads. High throughput, low latencies ...
Added: January 30, 2018
Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации
Борисенко О. Д., Пастухов Р. К., С.Д. Кузнецов, Труды Института системного программирования РАН 2016 Т. 28 № 6 С. 111–120
Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud environments is Openstack. The main goal of this project is to provide an ability to create virtual ...
Added: January 25, 2018
Реализация сервиса для выполнения Apache Spark задач и создания Apache Spark кластеров на основе Openstack Sahara
S. Kuznetsov, Борисенко О. Д., Алексиянц А. В. et al., Proceedings of the Institute for System Programming of the RAS 2015 Vol. 27 No. 5 P. 35–48
In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work. Also the implemented solution for building Apache Spark clusters and Apache Spark jobs execution in Openstack environment is described. The ...
Added: January 23, 2018
Автоматическое создание виртуальных кластеров Apache Spark в облачной среде Openstack
Kuznetsov S. D., Turdakov D. Y., Борисенко О. Д., Труды Института системного программирования РАН 2014 Т. 26 № 4 С. 33–44
This article is dedicated to automation of cluster creation and management for Apache Spark MapReduce implementation in Openstack environments. As a result of this project open-source (Apache 2.0 license) implementation of toolchain for virtual cluster on-demand creation in Openstack environments was presented. The article contains an overview of existing solutions for clustering automation in cloud ...
Added: November 26, 2017
Большие данные: современные подходы к хранению и обработке
Клеменков П. А., Kuznetsov S. D., Труды Института системного программирования РАН 2012 Т. 23 С. 143–158
Big data challenged traditional storage and analysis systems in several new ways. In this paper we try to figure out how to overcome this challenges, why it's not possible to make it efficiently and describe three modern approaches to big data handling: NoSQL, MapReduce and real-time stream processing. The first section of the paper is ...
Added: October 31, 2017
Gomapreduce parallel computing model implementation on a cluster of plan9 virtual machines
Leokhin, Y., Myagkov, A., Panfilov, P., , in: 26th DAAAM International Symposium on Intelligent Manufacturing and Automation 2015Vol. 1.: NY: Curran Associates, Inc., 2015. P. 0656 – 0662.
In this paper, we present results of a computational evaluation of goMapReduce parallel programming model approach for solving distributed data processing problems. In some applications, particularly data center problems, including text processing the programming models can aggregate significant number of parallel processes. We first discuss the implementation of these approaches using both Linux and Plan9 ...
Added: November 26, 2016
Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara
Turdakov D., Aleksiyants A., Borisenko O. et al., Proceedings of the Institute for System Programming of the RAS 2015 Vol. 27 No. 5 P. 35–48
In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Both clouds and MapReduce models are popular nowadays for a bunch of reasons: cheapness and efficient big data analysis respectively. For these thoughts, having an open source solution for building clusters is ...
Added: September 13, 2016
Applying MapReduce to Conformance Checking
Shugurov I., Mitsyuk A. A., Proceedings of the Institute for System Programming of the RAS 2016 Vol. 28 No. 3 P. 103–122
Process mining is a relatively new research field, offering methods of business processes analysis and improvement, which are based on studying their execution history (event logs). Conformance checking is one of the main sub-fields of process mining. Conformance checking algorithms are aimed to assess how well a given process model, typically represented by a Petri ...
Added: September 12, 2016
Putting OAC-triclustering on MapReduce
Зудин С., Gnatyshak D. V., Ignatov D. I., , in: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Clermont-Ferrand, France, October 13-16, 2015Vol. 1466.: Clermont-Ferrand: CEUR Workshop Proceedings, 2015. P. 47–58.
In our previous work an efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) was proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach; it has linear time and memory complexities. In this paper we parallelise it via map-reduce framework in order to make it suitable for big datasets. The results of ...
Added: October 23, 2015
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit