Implementing Big Data Processing Workflows Using Open Source Technologies

Suleykin, A.; P. Panfilov

doi:10.2507/30th.daaam.proceedings.054

Publications

?

Implementing Big Data Processing Workflows Using Open Source Technologies

Ch. 054. P. 0394–0404.

Suleykin, A., Panfilov P.

In our implementation research, we apply workflow approach to the modeling and development of the Big Data processing pipeline using open source technologies. The data processing workflow is a set of interrelated steps which launch some particular jobs such as Spark job, shell job or Postgre SQL command. All workflow steps are chained to form integrated process and imitate the data load from staging storage area to the datamart storage area. The experimental workflow-based implementation of a data processing pipeline was performed that stages through different storage areas and uses actual industrial KPI dataset of some 30 millions records. Evaluation of implementation results provides proofs of the applicability of proposed workflow to other application domains and datasets which should satisfy the data format at input stage of the workflow.

Language: English

DOI

Text on another site

Keywords: big data workflow data processing pipeline KPI data

In book

30th DAAAM International Symposium on Intelligent Manufacturing and Automation 2019.

Vol. 30 (1): Proceedings of the 30th DAAAM International Symposium ''Intelligent Manufacturing & Automation''. , Curran Associates, Inc., 2019.

Балканские войны 1912–1913 гг. в современных национальных СМИ Сербии как символ единения балканских народов

Мулина А. А., В кн.: Балканские войны 1912–1913 гг.: далекие предпосылки и долгое эхо.: М.: Институт славяноведения РАН, 2024. С. 287–297.

В данной статье рассматривается вопрос отражения событий 1912–1913 гг. в национальных СМИ Сербии в 2012–2013 и 2022–2023 гг. Опираясь на «большие данные», полученные из сервиса Google, а также на материалы качественной газеты «Политика», автор анализирует особенности освещения эпизодов Балканских войн, а также запросы пользователей интернета на территории Сербии по темам, связанным с событиями 1912–1913 гг. ...

Added: April 21, 2026

Президентские выборы в Турецкой Республике в информационном пространстве стран Балканского полуострова: медиагеографический анализ

Мулина А. А., Якова Т. С., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2025 Т. 30 № 1 С. 161–171

The article presents the results of a study of the information space of the Balkan states conducted during the presidential elections in Turkey (2023): the authors referred to this period as one of the most striking political events in the country over the past five years. The purpose of the proposed work is to identify ...

Added: April 21, 2026

Политические эффекты государственных цифровых платформ и сервисов в автократиях

Balayan A. A., Томин Л. В., Публичная политика 2023 Т. 7 № 1-2 С. 108–117

The paper is devoted to the study of certain aspects of the digitalization of public administration in autocracies, primarily government platforms and digital services. The analysis of the political effects of government platforms and services is carried out in the broader context of the study of new cybernetic mode of governance that complement/transform the disciplinary ...

Added: March 31, 2026

Цифровое общество: теоретическая модель и российская действительность

Smirnov A., Мониторинг общественного мнения: Экономические и социальные перемены 2021 № 1 С. 129–153

The article considers a theoretical model of digital society based on four concepts: super-connectivity, platformisation, datafication, and algorithmic governance. The model describes how the digitalisation of society deepens: from the transfer of individual practices and social interactions to a new social order based on big data. Analysis of panel data from the 2003–2018 longitudinal survey ...

Added: March 18, 2026

Прогнозирование миграционных процессов методами цифровой демографии

Smirnov A., Экономика региона 2022 Т. 18 № 1 С. 133–145

The nature and intensity of migration processes are constantly changing. Demographic statistics are not suitable for obtaining up-to-date information and making timely decisions in the field of demographic and social policy. Thus, digital demography is becoming increasingly important, as this area of population research uses new methods and data sources resulting from the Internet expansion ...

Added: March 18, 2026

Improving guest satisfaction by identifying hotel service micro-elements failures through Deep Learning of online reviews

Kazakov S., Cuesta-Valiño P., Butkovskaya V. et al., Cuadernos de Gestion 2025 Vol. 25 No. 1 P. 71–88

This study provides an in-depth examination of often-overlooked hotel service micro-elements within the broader spectrum of hospitality services, with the aim of improving service delivery and enhancing guest satisfaction. To achieve this, we develop a methodological framework that integrates: (a) VADER text-based sentiment analysis, (b) a robust logistic regression procedure to identify the specific hotel ...

Added: February 28, 2026

Data Analytics for Predicting Situational Developments in Smart Cities: Assessing User Perceptions

Kharlamov A. A., Pilgun M., , in: Special Issue Sensing Technology for Smart Cities: Data, Analytics, and VisualizationsVol. 24. Issue 15.: [б.и.], 2024.

The analysis of large volumes of data collected from heterogeneous sources is increasingly important for the development of megacities, the advancement of smart city technologies, and ensuring a high quality of life for citizens. This study aimed to develop algorithms for analyzing and interpreting social media data to assess citizens’ opinions in real time and ...

Added: February 22, 2026

Special Issue Sensing Technology for Smart Cities: Data, Analytics, and Visualizations

[б.и.], 2024.

Nowadays a huge portion of population lives in urban areas, and projections indicate that most cities are going to be confronted with a growing urban population in the next few years. This undoubtably poses new challenges that must be addressed by city councils and stakeholders to guarantee citizens’ high quality of life. Mobility, pollution, climate ...

Added: February 15, 2026

ALGORITHMIZATION OF LAW ENFORCEMENT MANAGEMENT PROCESSES USING ARTIFICIAL INTELLIGENCE

Barchukov, V., Relacoes Internacionais no Mundo Atual 2024 Vol. 4 No. 46 P. 113–132

Objective: Despite the opportunities that are opening up due to the development of information support systems and artificial intelligence in law enforcement, unfortunately, the Russian Federation has not yet fully formed a scientifically based legal and organizational framework for their integrated and practical application in activities of law enforcement agencies. The article aims to assess ...

Added: January 20, 2026

Artificial Intelligence for Urban Planning and Building Smart Cities

Demekhina A., Milshina Y., , in: Artificial Intelligence Enabled Real Time Environmental Monitoring.: Springer, 2026. P. 253–281.

Added: January 13, 2026

Перспективы интеграции новых цифровых технологий в современное образование для повышения его эффективности

Бояров Е. Н., Социальная компетентность 2025 Т. 10 № 2 С. 42–51

The article addresses the problem of integrating new digital technologies into modern education to enhance its effectiveness and quality. The purpose of the study is to summarize theoretical and practical approaches to the use of digital tools in educational environments and to identify key directions and barriers to the digital transformation of education. The research ...

Added: December 9, 2025

Мир стоит на пороге эпохи технологической сингулярности. Как изменятся тренды базовых глобальных процессов и эволюция человечества

Akaev A., Ильин И. В., Korotayev A., Вестник Российской академии наук 2025 № 9 С. 3–15

The article examines the likelihood of creating artificial intelligence (AI) at the human level (“human intelligence level”, AGI) by 2027-2029 and the onset of the era of technological singularity, when a fundamental change in the mechanism of human evolution will occur. It is noted that this probability is close to one, since these dates surprisingly ...

Added: October 28, 2025

Правовой режим данных в эпоху больших данных: правовые дилеммы и способы их разрешения

Лескина Э. И., Panarina M., Закон 2025 № 9 С. 91–100

The article studies the matter of finding the optimal legal regime for different categories of data. Legal problems related to the turnover of big data arise due to the lack of a unified legal framework for regulating the turnover of big data, a variety of data sources and ways to use them. Based on the ...

Added: October 1, 2025

РАЗРАБОТКА БАЗЫ ДАННЫХ ОБРАЗЦОВ ГОРНЫХ ПОРОД И РЕЗУЛЬТАТОВ ГЕОМЕХАНИЧЕСКИХ ЛАБОРАТОРНЫХ ИСПЫТАНИЙ

Krayushkin D. V., Казначеев П. А., Строганова С. М. et al., Геофизические процессы и биосфера 2025 Т. 24 № 3

During laboratory studies of many rock samples, it becomes necessary to develop an appropriate database. Adding new records and maintaining such databases is especially important when creating and updating mathematical and physical models describing hydrocarbon and ore deposits, gas storage facilities, engineering (in particular, transport, hydraulic, pipeline, etc.) structures, the development and operation of which ...

Added: September 21, 2025

Big Data and Artificial Intelligence for Decision-Making in the Smart Economy

Switzerland: Springer, 2025.

This book focuses on the systemic scientific-methodological and practical exploration of organizational-technical and socio-economic issues related to the automation of decision-making in the smart economy under Industry 4.0 using big data and artificial intelligence (AI). The scientific novelty of the results presented in the book lies in uncovering the “black box” of decision-making automation in ...

Added: August 27, 2025

ОБЗОР ПОДХОДОВ К ПОСТРОЕНИЮ БАЗ ДАННЫХ ПО ЛАБОРАТОРНЫМ ИСПЫТАНИЯМ ГОРНЫХ ПОРОД

Krayushkin D. V., Казначеев П. А., Белобородов Д. Е. et al., Горный информационно-аналитический бюллетень (научно-технический журнал) 2025 № 4 С. 152–169

Lab-scale rock testing includes processing of a huge variety of information. The increase in the number of tests, measured properties and other related information requires advanced approaches to such data processing and storage in the context of both data model architecture and modern tools of data base control. This study focuses on the current situation ...

Added: May 5, 2025

Study of artificial intelligence models for big data analysis in project management

Pshichenko D., International Journal of Humanities and Natural Sciences 2024 Vol. 8-3(95) P. 180–185

This study explores the application of artificial intelligence (AI) and machine learning (ML) models for big data analysis in project management. By leveraging specific ML algorithms such as decision trees, random forests, support vector machines, neural networks, kmeans clustering, gradient boosting, and natural language processing, project management practices are significantly enhanced. These technologies improve decision-making, ...

Added: March 10, 2025

Интеграция Big Data в системы рекомендаций: технологии персонализации контента

Кузнецов И. А., Бобунов А. Ю., Бушуев С. А. et al., Конкурентоспособность в глобальном мире: экономика, наука, технологии 2024 № 9 С. 56–61

he article explores modern technologies employed in content recommendation systems (CRS) using Big Data. It examines data processing and analysis methods that significantly enhance the personalization of recommendations to meet individual needs. The advantages associated with the integration of artificial intelligence (AI) and machine learning (ML) into Big Data processing to improve CRS efficiency are ...

Added: March 10, 2025

Editorial

Panos Pardalos, Valery Kalyagin, Mario R. Guarracino, Computational Management Science 2024 Vol. 21 No. 1 Article 35

Big data has become an integral part of modern networks. With the increasing amount of data generated by devices, machines, and applications, networks are constantly being challenged to handle and process this data in a timely and efficient manner. The size, complexity, and variety of data in networks are increasing rapidly, which requires new approaches ...

Added: February 22, 2025

BIG DATA и анализ высокого уровня = BIG DATA and Advanced Analytics : сборник научных статей X Международной научно-практической конференции в двух частях, Часть 1 (Республика Беларусь, Минск, 13 марта 2024 года)

Мн.: БГУИР, 2024.

The collection contains the results of scientific research and development in the field of BIG DATA and Advanced Analytics for optimizing IT and business solutions, as well as case studies in the field of medicine, education and ecology. ...

Added: February 18, 2025

Интеллектуальные информационные системы: теория и практика. Сборник научных статей по материалам V Международной конференции (Курск, 19–21 ноября 2024 года)

Курск: Курский государственный университет, 2024.

The collection publishes scientific articles based on the materials of the V International Scientific and Technical Conference "Intelligent Information Systems: Theory and Practice". The publication is addressed to students, postgraduates, university professors, candidates and doctors of science, practicing specialists, and anyone interested in the design and development of intelligent information systems, their practical applications, as ...

Added: February 17, 2025

Big Data Analytics Approach with Multiple Text Types: The Case of the Computer Gaming

Aleksandr Belov, Zakharov F., Litvinenko E. et al., , in: International IoT, Electronics and Mechatronics Conference, Volume 2. Proceedings of IEMTRONICS 2024. LNEE, volume 1228Vol. 1228.: Springer Publishing Company, 2025. P. 275–287.

Added: January 26, 2025

REVIEW OF THE COMPUTATIONAL APPROACHES TO ADVANCED MATERIALS SIMULATION IN ACCORDANCE WITH MODERN ADVANCED MANUFACTURING TRENDS

BOROVKOV A. I., SALKUTSAN S. V., RYABOV Y. A. et al., Materials Physics and Mechanics 2017 Vol. 32 No. 3 P. 328–352

In order to be competitive, modern manufacturers have to offer best-in-class products. Superior quality of the product requires introduction of new materials, digital design methods and advanced manufacturing technologies into production process. Special attention is given to numerical simulation as the most time efficient, flexible and cheap method to evaluate the level of optimality and ...

Added: September 23, 2024

Формирование экономических свойств цифровой среды

Dneprovskaya N., Шевцова И. В., Вестник Московского университета. Серия 6: Экономика 2024 Т. 59 № 4 С. 114–134

Digital environment includes a huge number of information and telecommunication technologies (IT), while their usage generates common features for them. The goal of the study is to identify the cumulative factor of digital environment and analysis of ways to exploit them into economic activities. The analysis of quantitative indicators of the state of digital environment in Russia and abroad ...

Added: September 6, 2024