Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

Борисенко О. Д.; Пастухов Р. К.; С.Д. Кузнецов

doi:10.15514/ISPRAS-2016-28(6)-8

Publications

?

Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации

Труды Института системного программирования РАН. 2016. Т. 28. № 6. С. 111-120.

Борисенко О. Д., Пастухов Р. К., С.Д. Кузнецов

Apache Spark is a framework providing fast computations on Big Data using MapReduce model. With cloud environments Big Data processing becomes more flexible since they allow to create virtual clusters on-demand. One of the most powerful open-source cloud environments is Openstack. The main goal of this project is to provide an ability to create virtual clusters with Apache Spark and other Big Data tools in Openstack. There exist three approaches to do it. The first one is to use Openstack REST APIs to create instances and then deploy the environment. This approach is used by Apache Spark core team to create clusters in propriatary Amazon EC2 cloud. Almost the same method has been implemented for Openstack environments. Although since Openstack API changes frequently this solution is deprecated since Kilo release. The second approach is to integrate virtual clusters creation as a built-in service for Openstack. ISP RAS has provided several patches implementing universal Spark Job engine for Openstack Sahara and Openstack Swift integration with Apache Spark as a drop-in replacement for Apache Hadoop. This approach allows to use Spark clusters as a service in PaaS service model. Since Openstack releases are less frequent than Apache Spark this approach may be not convenient for developers using the latest releases. The third solution implemented uses Ansible for orchestration purposes. We implement the solution in loosely coupled way and provide an ability to add any auxiliary tool or even to use another cloud environment. Also, it provides an ability to choose any Apache Spark and Apache Hadoop versions to deploy in virtual clusters. All the listed approaches are available under Apache 2.0 license.

Research target: Computer Science

Priority areas: IT and mathematics

Language: Russian

DOI

Text on another site

Keywords: облачные вычисления big data big data cloud computing Apache Spark Openstack Amazon EC2 HDFS virtual cluster Apache Spark Openstack Amazon EC2 HDFS виртуальные кластеры Map-Reduce Apache Ignite Map-Reduce Apache Ignite

Автоматическое создание виртуальных кластеров Apache Spark в облачной среде Openstack

Kuznetsov S. D., Turdakov D. Y., Борисенко О. Д., Труды Института системного программирования РАН 2014 Т. 26 № 4 С. 33-44

This article is dedicated to automation of cluster creation and management for Apache Spark MapReduce implementation in Openstack environments. As a result of this project open-source (Apache 2.0 license) implementation of toolchain for virtual cluster on-demand creation in Openstack environments was presented. The article contains an overview of existing solutions for clustering automation in cloud ...

Added: November 26, 2017

Разработка масштабируемой программной инфраструктуры для хранения и обработки данных в задачах вычислительной биологии

Kuznetsov S. D., Turdakov D. Y., Борисенко О. Д. et al., Труды Института системного программирования РАН 2014 Т. 26 № 4 С. 45-54

This article is an overview of scalable infrastructure for storage and processing of genome data in genetics problems. The overview covers used technologies descriptions, the organization of unified access to genome processing API of different underlying services. The article also covers methods for scalable and cloud computing technologies support. The first service in virtual genome ...

Added: November 26, 2017

МЕТОДИКА КОЛИЧЕСТВЕННОЙ ОЦЕНКИ РИСКА ИНФОРМАЦИОННОЙ БЕЗОПАСНОСТИ ДЛЯ ОБЛАЧНОЙ ИНФРАСТРУКТУРЫ ОРГАНИЗАЦИИ

Tsaregorodtsev A. V., Макаренко Е. В., Национальные интересы: приоритеты и безопасность 2014 № 44 С. 30-41

Almost all of the technologies that are now part of the cloud paradigm existed before, but so far the market has not been proposals that bring together emerging technologies in a single commercially attractive solution. However, in the last decade, there were public cloud services, through which these technologies, on the one hand, available to ...

Added: March 26, 2015

Построение гибридной защищенной облачной среды Ит-инфраструктуры организации

Tsaregorodtsev A. V., Los A., Sorokin A., Промышленные АСУ и контроллеры 2015 № 11 С. 26-31

Cloud computing is becoming one of the most common IT technologies for deploying applications, thanks to its key features: flexible solutions, available on request, and a good price / performance ratio. Migrating to the cloud-based architecture allows organizations to reduce the total cost of implementation and support infrastructure, and reduce development time for new business ...

Added: October 20, 2015

Построение деревьев целей для идентификации требований безопасности среды облачных вычислений

Tsaregorodtsev A. V., Национальная безопасность / nota bene 2013 № 5 С. 51-68

Need to improve and increase the efficiency of the cardinal principles of information security management cloud environment leads to the area of multidimensional properties of " systematic ." Application of technology and methods of structural synthesis of formal information security management systems (ISMS ) in the cloud , connecting different structure hierarchies requirements would more ...

Added: March 17, 2014

Методика построения защищенных информационно-телекоммуникационных систем на базе гибридной облачной среды

Tsaregorodtsev A. V., Мухин И. Н., Белый А. Ф., Информация и безопасность 2015 Т. 18 № 3 С. 404-407

The widespread use of cloud computing calls for adaptation and refinement of existing approaches to the construction of information and telecommunication systems. Data migrating to the cloud-based architecture enables to reduce total cost of implementation and maintenance of infrastructure and reduces development time for new business applications. Thus, the question of information security remains open. ...

Added: March 15, 2016

Один из подходов к построению информационной инфраструктуры организации на базе гибридной облачной среды

Tsaregorodtsev A. V., Мухин И. Н., Боридько С. И., Информация и безопасность 2015 Т. 18 № 3 С. 400-403

Due to the fact that cloud computing bring the new challenges in the field of information security, it is imperative for the organization to control the process of information security management in the cloud. The level of confidence in the services provided can vary significantly depending on the goals of the organization, the structure of ...

Added: March 15, 2016

Комплексный подход к построению защищенных информационно-телекоммуникационных систем на базе гибридной облачной среды

Tsaregorodtsev A. V., Los A., Sorokin A., Национальная безопасность / nota bene 2015

В статье рассматриваются вопросы обеспечения информационной безопасности при проведении облачных вычислений. Информационно-телекоммуникационные системы, функционирующие на основе технологии облачных вычислений, в последнее время получают все большее распространение в связи с постоянно растущими потребностями в вопросах обработки и хранения больших объемов данных, что подтверждает актуальность рассматриваемых в статье вопросов. При этом ключевым моментом при использовании облачных вычислений ...

Added: October 20, 2015

Базовые принципы построения дерева целей информационной безопасности среды облачных вычислений

Tsaregorodtsev A. V., Ермошкин Г. Н., Национальная безопасность / nota bene 2013 № 5 С. 69-79

Change of a contour of safety and exit of critical assets of the organizations from under internal control with the subsequent migration of these assets on cloudy Wednesday nominated a problem of management of information security of the corporate systems functioning on the basis of technology of cloud computing to the first place. All this ...

Added: March 26, 2015

ОЦЕНКА РИСКА БЕЗОПАСНОСТИ ДАННЫХ В ИНФОРМАЦИОННО-ТЕЛЕКОММУНИКАЦТОННЫХ СИСТЕМАХ НА ОСНОВЕ ОБЛАЧНЫХ ВЫЧИСЛЕНИЙ

Tsaregorodtsev A. V., Лавриненко М. М., Лапенкова Н. В., Безопасность информационных технологий 2014 № 1 С. 36-40

Cloud computing will be one of the most common IT technologies to deploy applications, due to its key features: on-demand network access to a shared pool of configurable computing resources, flexibility and good quality/price ratio. Migrating to cloud architecture enables organizations to reduce the overall cost of implementing and maintaining the infrastructure and reduce development ...

Added: March 26, 2015

ОДИН ИЗ ПОДХОДОВ К ПОСТРОЕНИЮ ГИБРИДНОЙ ЗАЩИЩЕННОЙ ОБЛАЧНОЙ СРЕДЫ

Tsaregorodtsev A. V., Качко А. К., Лавриненко М. М., Безопасность информационных технологий 2014 № 1 С. 22-27

In response to the ever growing needs in the storage and processing of data the main position are occupied by informational-telecommunication systems, operating on the basis of cloud computing. In this case, the key point in the use of cloud computing is the problem of information security. This article is primarily intended to cover the ...

Added: March 26, 2015

Реализация сервиса для выполнения Apache Spark задач и создания Apache Spark кластеров на основе Openstack Sahara

S. Kuznetsov, Борисенко О. Д., Алексиянц А. В. et al., Proceedings of the Institute for System Programming of the RAS 2015 Vol. 27 No. 5 P. 35-48

In this paper the problem of creating virtual clusters in clouds for big data analysis with Apache Hadoop and Apache Spark is discussed. Existing methods for Apache Spark clusters creation are described in this work. Also the implemented solution for building Apache Spark clusters and Apache Spark jobs execution in Openstack environment is described. The ...

Added: January 23, 2018

Формализованная модель безопасности рабочих процессов информационно-телекоммуникационных систем, функционирующих на основе технологии облачных вычислений

Tsaregorodtsev A. V., Нелинейный мир 2013 Т. 11 № 9 С. 610-621

Use of cloud computing applications and services requires review and adaptation of existing formal models for informational telecommunication systems security. It is necessary to consider the benefits of cloud deployment models and provide the procedure for allocating process among components of cloud computing environment for achieving confidentiality and data protection. ...

Added: March 26, 2015

Numerical optimization for Artificial Retina Algorithm

Borisyak M., Ustyuzhanin A., Derkach D. et al., Journal of Physics: Conference Series 2017 Vol. 898 No. 3 P. 1-6

High-energy physics experiments rely on reconstruction of the trajectories of particles produced at the interaction point. This is a challenging task, especially in the high track multiplicity environment generated by p-p collisions at the LHC energies. A typical event includes hundreds of signal examples (interesting decays) and a significant amount of noise (uninteresting examples). This ...

Added: February 25, 2018

Обеспечение информационной безопасности облачных вычислений

Isaev E., Dumsky D., Samodurov V. et al., Математическая биология и биоинформатика 2015 Т. 10 № 2 С. 567-579

The rapid development of information technology in today's society dictates new requirements for information security technologies of data, methods of remote access and data processing, integrated reduction of financial expenses on working with information. In recent years, the ideal solution to all these problems that is widely suggested is the concept of cloud computing. This ...

Added: December 19, 2015

Service-Oriented Computing

Berlin, Heidelberg : Springer, 2013

The proceedings of the 11th International Conference on Service-Oriented Computing (ICSOC 2013), held in Berlin, Germany, December 2–5, 2013, contain high-quality research papers that represent the latest results, ideas, and positions in the field of service-oriented computing. Since the first meeting more than ten years ago, ICSOC has grown to become the premier international forum ...

Added: March 21, 2014

Модель оценки рисков информационной безопасности информационных систем на основе облачных вычислений

Tsaregorodtsev A. V., Ермошкин Г. Н., Национальная безопасность / nota bene 2013 № 6 С. 46-54

Widespread acceptance and adoption of cloud computing calls for adaptation and development of existing risk assessment models of information systems. The approach suggested in this article can be used for risk assessment of information systems functioning on the basis of cloud computing technology, and assess the effectiveness of security measures. ...

Added: March 17, 2014

Proceedings 2018 Global Smart Industry Conference (GloSIC)

Chelyabinsk : IEEE, 2018

The 2018 Global Smart Industry Conference is organized in order to exchange experience, promote discussion and presentation of research papers, and summarize results in development of innovative models, methods and technologies for the digital industry in universities, scientific and industrial associations of the Russian Federation as well as in foreign companies, and the experience of ...

Added: November 25, 2019

Метод моделирования маршрутов распределения обработки критичных данных в гибридной среде облачных вычислений на основе модифицированных сетей Петри

Tsaregorodtsev A. V., Дербин Е. А., Мухин И. Н., Информация и безопасность 2015 Т. 18 № 3 С. 408-411

The use of cloud computing to build of IT-infrastructure of the organization implies the refusal of the organization direct control over the security aspects. There is a need for solving the problem of data privacy in the design architecture based on cloud computing technology. In the article the simulation method of data processing using Petri ...

Added: March 15, 2016

К вопросу о существовании доказуемо стойких систем облачных вычислений

Zakharov V., Варновский Н. П., Шокуров А. В., Вестник Московского университета. Серия 15: Вычислительная математика и кибернетика 2016 № 2 С. 32-38

We study a formal model of cloud computing systems supplied with auxiliary cryptoservers. Assuming an existence of a secure threshold somewhat homomorphic open key cryptosystem we show how to build a secure cloud computing system in the framework of this model. ...

Added: October 13, 2016

2020 Global Smart Industry Conference (GloSIC)

IEEE, 2020

Added: December 3, 2020

ОДИН ИЗ ПОДХОДОВ К ОЦЕНКЕ РИСКОВ ИНФОРМАЦИОННОЙ БЕЗОПАСНОСТИ В ОБЛАЧНЫХ СРЕДАХ

Tsaregorodtsev A. V., Малюк А. А., Макаренко Е. В., Безопасность информационных технологий 2014 № 4 С. 68-74

Due to the fact that cloud computing bring with them new challenges in the field of information security, it is imperative for organizations to control the process of information risk management in the cloud. This paper proposes a risk assessment approach for assessing the potential damage from the attack on the implementation of components of ...

Added: March 26, 2015

Построение гибридной защищенной облачной среды ИТ-инфраструктуры организации

Tsaregorodtsev A. V., Los A., Sorokin A., Промышленные АСУ и контроллеры 2015 № 11 С. 26-31

Added: March 15, 2016

Методика количественной оценки риска в информационной безопасности облачной инфраструктуры организации

Tsaregorodtsev A. V., Макаренко Е. В., Дайджест-финансы 2015 № 1(233) С. 56-67

Added: March 15, 2016