Distributed data replication and access optimization for LHCb storage system - A Position Paper

Hushchyn M.; Charpentier P.; A. Ustyuzhanin

?

Distributed data replication and access optimization for LHCb storage system - A Position Paper

Hushchyn M., Charpentier P., Ustyuzhanin A.

This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two di↵erent storage types are used to store data in the hybrid data storage systems. Keeping low-frequenty used data on cheap and slow storages of type one and high-frequently used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will high-frequently used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal nuber of data replicas our recommendation system finds optimal data distribution in the hybrid data storage system. We present the results of our method implementation in LHCb hybrid data storage.

Language: English

Full text

Keywords: distributed systems Algorithms and data structures

In book

International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

[б.и.], 2015

22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers

Springer, 2022

This book constitutes selected and revised papers from the 22nd International Conference on Mathematical Modeling and Supercomputer Technologies, MMST 2022, held in Nizhny Novgorod, Russia, in November 2022. The 20 full papers and 5 short papers presented in the volume were thoroughly reviewed and selected from the 48 submissions. They are organized in topical secions on computational methods ...

Added: December 26, 2022

Measurements of mobile blockchain execution impact on smartphone battery

Bardinova Y., Zhidanov K., Bezzateev S. et al., Data 2020 Vol. 5(3) P. 66

This is a data descriptor paper for a set of the battery output data measurements during the turned on display discharge process caused by the execution of modern mobile blockchain projects on Android devices. The measurements were executed for Proof-of-Work (PoW) and Proof-of-Activity (PoA) consensus algorithms. In this descriptor, we give examples of Samsung Galaxy ...

Added: October 8, 2020

Array DBMS: Past, Present, and (Near) Future

Rodriges Zalipynis R. A., PROCEEDINGS OF THE VLDB ENDOWMENT 2021 Vol. 14 No. 12 P. 3186-3189

Array DBMSs strive to be the best systems for managing, processing, and even visualizing big N-d arrays. The last decade blossomed with R&D in array DBMS, making it a young and fast-evolving area. We present the first comprehensive tutorial on array DBMS R&D. We start from past impactful results that are still relevant today, then ...

Added: June 4, 2021

Method for accelerating the operation of joining distributed datasets by a given criterion

Tyryshkina Y., , in : Международная научнопрактическая конференция «Информационные Инновационные Технологии», 2022. : [б.и.], 2022.

In this paper, we consider the problem of reducing the cost of computer time by developing and implementing a method for accelerating the operation of connecting distributed data arrays according to a given criterion. The following tasks were solved: a study was conducted on the architecture of distributed data storages and parallel computing algorithms; on ...

Added: May 31, 2022

ChronosDB: Distributed, File Based, Geospatial Array DBMS

Rodriges Zalipynis R. A., , in : Proceedings of the VLDB Endowment. Vol. 11. Issue 10.: VLDB Endowment, 2018. P. 1247-1261.

An array DBMS streamlines large N-d array management. A large portion of such arrays originates from the geospatial domain. The arrays often natively come as raster files while standalone command line tools are one of the most popular ways for processing these files. Decades of development and feedback resulted in numerous feature-rich, elaborate, free and ...

Added: September 24, 2017

24th International Conference on Principles of Distributed Systems (OPODIS2020)

Dagstuhl Publishing, 2021

The papers in this volume were presented at the 24th International Conference on Principles of Distributed Systems (OPODIS 2020), held on December 14–16, 2020. Originally planned to be held in Strasbourg, France, the conference was held online due to the COVID19 pandemic. OPODIS is an open forum for the exchange of state-of-the-art knowledge about distributed ...

Added: October 14, 2021

A Combined Toolset for the Verification of Real-Time Distributed Systems

Zakharov V.A., Volkanov D. Y., Zorin D. A. et al., Programming and Computer Software 2015 Vol. 41 No. 6 P. 325-335

Checking the correctness of distributed systems is one of the most difficult and urgent problems in software engineering. A combined toolset for the verification of real-time distributed systems (RTDS) is described. RTDSs are specified as statecharts in the Universal Modeling Language (UML). The semantics of statecharts is defined by means of hierarchical timed automata. The ...

Added: October 13, 2015

Управление размещением данных в распределенных системах с микросервисной архитектурой

Breyman A., Прикаспийский журнал: управление и высокие технологии 2018 № 1 (41)

One of the promising approaches to the development of distributed systems is to divide it into a number of independently deployable modules (microservices) that use messaging for data interchange. As a rule, each individual microservice implements some business function, completely hiding its implementation details, including method of data persistence – be it tables in the ...

Added: February 26, 2018

Compositional Process Model Synthesis based on Interface Patterns

Roman A. Nesterov, Irina A. Lomazova, , in : Tools and Methods of Program Analysis: 4th International Conference, TMPA 2017, Moscow, Russia, March 3-4, 2017, Revised Selected Papers. Vol. 779: Communications in Computer and Information Science.: Springer, 2018. P. 151-162.

Coordination of several distributed system components is an error-prone task, since interaction of several simple components can generate rather sophisticated behavior. Verification of such systems is very difficult or even impossible because of the so-called state space explosion problem, when the size of the system reachability set grows exponentially on the number of interacting agents. ...

Added: October 7, 2017

СРАВНИТЕЛЬНЫЙ АНАЛИЗ СТРУКТУР ДАННЫХ ДЛЯ ПРИБЛИЖЕННОГО ПОИСКА БЛИЖАЙШЕГО СОСЕДА

Ponomarenko A., Avrelin N., Найдан Б. С. et al., Алгоритмы, методы и системы обработки данных 2015 Т. 4 № 33 С. 91-106

Поиск по похожести широко применяется в различных областях компьютерных наук. Множество методов было предложено для решения задачи в точной постановке, однако все они подвержены "проклятью" размерности и не эффективны для данных высокой размерности. Приближенные алгоритмы отчасти позволяют справиться с "проклятьем". Однако из-за сложной стохастической природы, теоретические оценки для большинства приближенных алгоритмов отсутствуют. Более того, на ...

Added: September 27, 2016

Операционная система Plan9 как реализация идеологии ГРИД (Operating system Plan9 as the implementation of the GRID ideology)

Gostev I. M., Севастьянов Л. А., Королькова А. В. et al., В кн. : Distributed Computing and Grid-technologies in Science and Education 2016. Vol. Vol-1787.: CEUR-WS, 2017. С. 230-234.

When we organize parallel computations on a cluster system, the computer system structure is not hidden from the user, and should be taken into account while writing parallel programs. GRID ideology introduces an additional level of abstraction and makes it possible to link together heterogeneous computing systems. In fact, the inability to control the operating ...

Added: February 15, 2017

Digital Ecosystem-Based KPI-Driven Railway Communication Network Reporting System

Panfilov, P., Suleykin, A., ElDarawany, A., , in : MEDES '21: Proceedings of the 13th International Conference on Management of Digital EcoSystems. : NY : Association for Computing Machinery (ACM), 2021. P. 163-166.

This research is focused on architectural and modeling issues of design and development of digital reporting system aimed at the railway communication network infrastructure. Our approach to these problems is based on digital ecosystem paradigm and open-source Big Data technologies. It also aims at methodology for KPIs data preparation and collection in railway communication networks. ...

Added: January 15, 2022

Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case

Rodriges Zalipynis R. A., , in : SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data. : NY : ACM, 2021. P. 2399-2403.

Array DBMSs manage big N-d arrays, are not yet widely known, but are experiencing an R&D surge due to the rapid growth of array volumes. Cellular automata (CA) operate on a discrete lattice of cells that can be modeled by an N-d array. CA are successfully applied to model fire spread, land cover change, road ...

Added: April 28, 2021

Distributed integrated navigation systems for planetary defense against asteroids

Krobka N., Aksenov S. A., Bober S. A. et al., Gyroscopy and Navigation 2016 Vol. 7 No. 3 P. 296-310

The main objectives of this paper are to give an interdisciplinary overview of the current status of the research on planetary defense against asteroids, which is a real challenge, and consider technical proposals on the development of a multilevel planetary defense system based on modern space technologies, providing for the application of projectile asteroids to ...

Added: October 8, 2016

Автоматическое построение распределенных систем компонентов по моделям вложенных сетей Петри

Л. В. Дворянский, И. А. Ломазова, Программирование 2016 № 5 С. 49-67

Multi-agent systems (MAS) with many levels and dynamic hierarchical structure are widely used in telecommunication, transport, social, and other fields. Assuring correctness of such systems is an important and topical issue. In this paper we consider modeling MAS with dynamic structure with the help of Nested Petri nets (NPNs). NPN is an extension of Petri nets within ...

Added: December 4, 2015

SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data

NY : ACM, 2021

The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. The conference includes a fascinating technical program with research and industrial talks, tutorials, demos, and focused workshops. It also hosts a poster session to learn about innovative ...

Added: April 28, 2021

Распределенная система энергомониторинга реального времени на основе технологии IoT

Kychkin A., Артемов С. А., Белоногов А. В., Датчики и системы 2017 № 8-9 С. 49-55

The article considers the approach to constructing a multi-level architecture of a distributed system of energy monitoring based on Internet of Things (IoT) technology. The technology is implemented on a basis of controllers, remote access to which is carried out via the Internet. Such a network has a large number of nodes. The routing of ...

Added: November 21, 2017

Concurrency, Specification & Programming. 24th International Workshop, CS&P 2015. Rzeszow, Poland, September 28-30, 2015. Proceedings

University of Rzeszow, 2015

Added: October 11, 2015

Эволюция схемы базы данных в распределенных системах с микросервисной архитектурой

Breyman A., Прикаспийский журнал: управление и высокие технологии 2018 № 2 (42)

In distributed systems designed using microservice architecture, business functions are encapsulated in independently deployable modules that asynchronously access one another using messaging. The choice of the method of functions assignment to microservices being designed, and corresponding data placement, can be carried out both heuristically and using a specialized methodology based on domain-oriented design and relational ...

Added: February 26, 2018

Координация взаимодействия микросервисов в распределенных системах

Breyman A., Прикаспийский журнал: управление и высокие технологии 2018 № 3 (43)

In a service-oriented architecture, orchestration is the primary way to ensure services collaboration. Orchestration might be performed by a centralized or distributed component, such as a message broker or enterprise service bus. List of orchestration’s advantages includes increased flexibility of connecting and disconnecting services, universal way to control the routing and transformation of requests, etc. ...

Added: February 27, 2018

Supercomputing Book Subtitle 7th Russian Supercomputing Days, RuSCDays 2021, Moscow, Russia, September 27–28, 2021, Revised Selected Papers Series Title Communications in Computer and Information Science

Springer, 2021

This book constitutes the refereed post-conference proceedings of the 7th Russian Supercomputing Days, RuSCDays 2021, held in Moscow, Russia, in September 2021. The 37 revised full papers and 3 short papers presented were carefully reviewed and selected from 99 submissions. The papers are organized in the following topical sections: supercomputer simulation; HPC, BigData, AI: architectures, technologies, ...

Added: January 18, 2022

Proceedings of Machine Learning Research

Kovalev D., Shulgin E., Richtarik P. et al., PMLR, 2021

We propose ADOM – an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the ...

Added: October 31, 2021

Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers

Springer, 2022

This book constitutes the refereed proceedings of the 16th International Conference on Parallel Computational Technologies, PCT 2022, held in Dubna, Russia, during March 29–31, 2022. The 22 full papers included in this book were carefully reviewed and selected from 60 submissions. They were organized in topical sections as follows: high performance architectures, tools and technologies; parallel ...

Added: August 10, 2022

In-situ processing of big raster data with command line tools

Rodriges Zalipynis R. A., , in : Proceedings of the international Conference "Russian Supercomputing Days 2016". : M. : Moscow Lomonosov University, 2016. P. 20-25.

Explosive growth of raster data volumes in numerical simulations, remote sensing and other fields stimulate the development of new efficient data processing techniques. For example, in-situ approach queries data in diverse file formats avoiding time-consuming import phase. However, after data are read from file, their further processing always takes place with code developed almost from ...

Added: October 5, 2016