Algebra of Information in Big Data Processing

P. Golubtsov

?

Algebra of Information in Big Data Processing

Ch. 4. P. 1–15.

Golubtsov P.

In big data problems the data usually are collected on many sites, have a huge volume, and new pieces of data are constantly generated. It is often impossible to collect all the data needed for a research project on one computer, and even impractical, since one computer would not be able to process it in a reasonable time. An appropriate data analysis algorithm should, working in parallel on many computers, extract from each set of raw data some intermediate compact “information”, gradually combine and update it, and finally, use the accumulated information to produce the result. When new data appears, it must extract information from them, add it to the accumulated one, and eventually update the result. We consider several examples of a suitable transformation of processing algorithms, discuss specific features of the emerging information spaces and, in particular, their algebraic properties. We also show that the information space often can be equipped with an order relation that reflects the "quality" of the information.

Language: English

Full text

Text on another site

Keywords: information space parallel processing D istributed systems of data collecting and processing forms of information representation algebra of information quality of information

In book

INTERNATIONAL CONFERENCE INFORMATION SYSTEMS 2017 SPECIAL INTEREST GROUP ON BIG DATA PROCEEDINGS

Assiciation of Information Systems Electronic Library (AISel), 2017.

Эволюция роли медиа в доктринах информационной безопасности Российской Федерации 2000 и 2016 годов

Кочкин А. В., Коммуникации. Медиа. Дизайн 2023 Т. 8 № 3 С. 80–104

The article reveals the actual problem of determining the media role in the strategic documents of Russian information policy. In particular, two doctrines of information security, adopted with an interval of sixteen years, are analyzed, which makes it possible to identify the evolution of the state’s understanding of the role of the media in the ...

Added: October 7, 2023

ЗАЩИТА КИБЕРПРОСТРАНСТВА В СТРАНАХ ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Полис. Политические исследования 2022 № 3 С. 108–123

The adoption of a national cybersecurity strategy testifies to a country’s awareness of the importance of protecting cyber infrastructure, the digital economy and the business environment, on which information and economic well-being are already highly dependent. However, only a few Latin American countries have developed and adopted their own national cybersecurity strategy. This article analyzes ...

Added: May 25, 2022

ИНТЕРНЕТ КАК СРЕДСТВО ФОРМИРОВАНИЯ ОБЩЕСТВЕННОГО ИМИДЖА МЕЖДУНАРОДНОГО ОЛИМПИЙСКОГО ДВИЖЕНИЯ

Istyagina-Eliseeva E., Актуальные вопросы физической культуры и спорта 2000 Т. 3 С. 222–226

Added: December 8, 2020

СТРАТЕГИИ КИБЕРБЕЗОПАСНОСТИ СТРАН ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Iberoamérica 2020 No. 1 P. 137–159

Thanks to the active development of information technology and the creation of a global information space, the basis of which is the Internet, new opportunities have been opened to the world. However, this has led to the emergence of new types of dangers - cybercrime, cyber attacks and cyber warfare. The need to counter them, ...

Added: March 26, 2020

Scalability and Parallelization of Sequential Processing: Big Data Demands and Information Algebras

Golubtsov P., , in: Advances in Intelligent Systems and Computing book series Vol. 1127. Advances in Intelligent Systems, Computer Science and Digital EconomicsVol. 1127: Advances in Intelligent Systems, Computer Science and Digital Economics.: Switzerland: Springer, 2020. P. 274–298.

Procedures of sequential updating of information are important for “big data streams” processing because they avoid accumulating and storing large data sets. As a model of information accumulation, we study the Bayesian updating procedure for linear experiments. Analysis and gradual transformation of the original processing scheme in order to increase its efficiency lead to certain ...

Added: March 16, 2020

Specific Features of Big Data Processing and the Concept of Information

Golubtsov P., , in: Proceedings of the Russian-French Workshop in Big Data and Applications. October 12–13, 2017, Moscow.: M.: Higher School of Economics Publishing House, 2018. P. 45–66.

The Data in “big data” sets, as a rule, have a huge volume, are distributed among numerous sites and are constantly replenished. As a result even a simplest analysis of big data faces serious difficulties. To apply traditional processing all the relevant data has to be collected in one place and arranged in the form ...

Added: January 23, 2019

Information spaces: optimizing sequential and parallel processing in big data

Golubtsov P., , in: 7th International conference "Problems of Mathematical Physics and Mathematical Modelling” (2018) Book of abstracts.: M.: National Research Nuclear University "MEPhI", 2018. P. 173–176.

The process of Bayesian information update is essentially sequential: as a result of observation, a prior information is transformed to a posterior, which is later interpreted as a prior for the next observation, etc. It is shown that this procedure can be unified and parallelized by converting both the measurement results and the original prior ...

Added: January 23, 2019

The Concept of Information in Big Data Processing

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 1 P. 38–43

The need to transform existing algorithms in Big Data Systems is considered. The transformation must allow independent and parallel processing of separate fragments of data. The characteristic aspects of a well-organized intermediate compact form of information and its natural algebraic properties are studied and an illustrative example is provided. ...

Added: January 23, 2019

The Linear Estimation Problem and Information in Big-Data Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 2 P. 73–79

This paper addresses the problem of transforming the optimal linear estimation procedure in such a way that separate fragments of initial data are processed individually and concurrently. A representation of intermediate information is proposed that allows an algorithm to concurrently extract this information from each initial data set, combine it, and use it for estimation. ...

Added: January 23, 2019

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 4 P. 203–213

The procedure of transition from a priori to a posteriori information for a linear experiment in the context of Big Data systems is considered. At first glance, this process is fundamentally sequential, namely: as a result of observation, a priori information is transformed into a posteriori information, which is later interpreted as a priori for ...

Added: January 18, 2019

PROCEEDINGS 46th International Conference on Parallel Processing Workshops ICPPW 2017

Piscataway: IEEE Computer Society, 2017.

The papers in this book comprise the proceedings of the 46th International Conference on Parallel Processing Workshops — ICPPW 2017 — 14 August 2017 Bristol, United Kingdom. ...

Added: January 30, 2018

Снижение совокупной стоимости владения информационно-аналитической системой за счет создания системы интеграции данных

Lychagin K., Pozin B., Открытое образование 2011 № 2 С. 238–242

The paper describes categories of costs which can be included in total cost of ownership (TCO) of information-analytical system and which are connected with the data integration system (DIS). Some problems of creating DIS are described. The streaming architecture of DIS, which helps to solve problems of stability and scalability of DIS and to lower ...

Added: November 13, 2016

Информационное пространство как новое (гео) политическое пространство: роль и место государств

Kabanov Y., Сравнительная политика 2014 № 4 С. 54–59

Information space as a new (geo)political space exerts an increasingly higher influence on the global processes. Despite its specific nature, its connection with physical space is growing, furthermore, the states describe themselves as leading actors of information space more actively. Although the approaches diff er from one another, information space is being generally perceived in ...

Added: November 9, 2015

"Клиповое мышление": психологические дефициты и альтернативы (пространственный фокус)

Isaeva A. N., Малахова С. А., Мир психологии. Научно-методический журнал 2015 Т. 84 № 4 С. 177–191

The article is devoted to the transformation of modern thinking becouse of development of the virtual space of life. Clip thinking is considered as phenomenon of 21st century’s «digital personality» which constitutes and formatted by media space. The paper is devoted to understanding of deficits and potentials of «clip thinking». Potential of symbolic thinking that ...

Added: November 5, 2015

Технология диалога в современном коммуникационном пространстве

Dzyaloshinsky I. M., Pilgun M. A., Вопросы теории и практики журналистики 2014 № 5 С. 42–54

The article describes the social, psychological, linguistic and organizational conditions for a effective dialogue, as well as features of dialogic communication in the modern media.It is shown that an open and productive dialogue is a characteristic of conscious and actively implemented by the social partnership. The article analyzes the position that modern media is undergoing ...

Added: January 4, 2015

Исследование эффективности программной реализации многопотокового алгоритма масштабирования методом билинейной интерполяции

Vnukov A., Егоров И. В., Горный информационно-аналитический бюллетень (научно-технический журнал) 2014 № 9 С. 264–271

The article compares efficiency of sequential and parallel approaches to digital image zooming in software implementation. The method of bilinear interpolation is chosen as a sample. For the study purposes, a test routine was written in С# language. In the tests the operate time of sequential and parallel processing of the same images was compared. ...

Added: July 26, 2014

Parallel averaging of size is possible but range-limited: A reply to Marchant, Simons, and De Fockert

Utochkin I. S., Tiurina N., Acta Psychologica 2014 Vol. 146 P. 7–18

In their recent paper, Marchant, Simons, and De Fockert (2013) claimed that the ability to average between multiple items of different sizes is limited by small samples of arbitrarily attended members of a set. This claim is based on a finding that observers are good at representing the average when an ensemble includes only two ...

Added: October 25, 2013

Пространственная организация «информационного общества» как предмет системного анализа и объект государственного регулирования

Швецов А. Н., Регион: Экономика и Социология 2012 № 4 С. 45–66

The article covers the aspects of the Russian informatization in its spatial dimensions. The spatial disproportions of informatization are considered as a particular theme of scientific research and a subject of the governmental policy on interregional leveling. The author analyzes a methodical approach to the estimation of regional differences in informatization, and the best practices ...

Added: April 7, 2013

Авторско-правовые аспекты сохранения и развития русскоязычного информационного пространства

Fedotov M., Право. Журнал Высшей школы экономики 2009 № 2 С. 31–56

The article tracks the preservation and development the of Russian-language information space as a social and cultural phenomenon. The author studies various ways to overcome its disintegration, looks at issues connected with necessity of harmonizing two principles - freedom of information and securing intellectual property rights. He presents proposals about methods of legal regulation of ...

Added: October 15, 2012

Инфраструктура управления знаниями: типы и характеристика

Викторов Е. Г., Проблемы теории и практики управления 2010 № 9 С. 23–29

The author investigates issues related to the methodology and technology of applying the knowledge management system in an organization, describes infrastructure types needed to successfully practice a complex project of introducing an organizational, social and technological system of knowledge management. ...

Added: October 10, 2012