Specific Features of Big Data Processing and the Concept of Information

P. Golubtsov

Сhapter

Specific Features of Big Data Processing and the Concept of Information

P. 45–66.

Golubtsov P.

The Data in “big data” sets, as a rule, have a huge volume, are distributed among numerous sites and are constantly replenished. As a result even a simplest analysis of big data faces serious difficulties. To apply traditional processing all the relevant data has to be collected in one place and arranged in the form of convenient structures. Only then the corresponding algorithm processes these structures and produces the result of analysis. In the case of big data, it can be just impossible to collect all the relevant data on one computer, and even impractical, since one computer would not be able to process them in a reasonable time. An appropriate data analysis algorithm should, working in parallel on many computers, extract from each set of raw data some intermediate compact “information”, gradually combine and update it, and finally, use the accumulated information to produce the result. Upon arrival of new pieces of data, it should be able to add them to the accumulated information and eventually renew the result. We will discuss specific features of such well-arranged intermediate form of information, reveal its natural algebraic properties, and present several examples. We will also see that in many important data processing problems the appropriate information space may become equipped with an ordering which reflects the “quality” of the information. It turns out that such an intermediate form of information representation in some sense reflects the very essence of the information contained in the data. This leads us to a completely new, ‘practical’ approach to the notion of information.

Language: English

Text on another site

Keywords: information space parallel processing big data systems linear estimation canonical information distributed data collection and processing systems algebra of information quality of information information representation

In book

Proceedings of the Russian-French Workshop in Big Data and Applications. October 12–13, 2017, Moscow

M.: Higher School of Economics Publishing House, 2018.

СТРАТЕГИИ КИБЕРБЕЗОПАСНОСТИ СТРАН ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Iberoamérica 2020 No. 1 P. 137–159.

Thanks to the active development of information technology and the creation of a global information space, the basis of which is the Internet, new opportunities have been opened to the world. However, this has led to the emergence of new types of dangers - cybercrime, cyber attacks and cyber warfare. The need to counter them, ...

Added: March 26, 2020

Пространственная организация «информационного общества» как предмет системного анализа и объект государственного регулирования

Швецов А. Н., Регион: Экономика и Социология 2012 № 4 С. 45–66.

The article covers the aspects of the Russian informatization in its spatial dimensions. The spatial disproportions of informatization are considered as a particular theme of scientific research and a subject of the governmental policy on interregional leveling. The author analyzes a methodical approach to the estimation of regional differences in informatization, and the best practices ...

Added: April 7, 2013

Эволюция роли медиа в доктринах информационной безопасности Российской Федерации 2000 и 2016 годов

Кочкин А. В., Коммуникации. Медиа. Дизайн 2023 Т. 8 № 3 С. 80–104.

The article reveals the actual problem of determining the media role in the strategic documents of Russian information policy. In particular, two doctrines of information security, adopted with an interval of sixteen years, are analyzed, which makes it possible to identify the evolution of the state’s understanding of the role of the media in the ...

Added: October 7, 2023

Scalability and Parallelization of Sequential Processing: Big Data Demands and Information Algebras

Golubtsov P., , in: Advances in Intelligent Systems and Computing book series Vol. 1127. Advances in Intelligent Systems, Computer Science and Digital EconomicsVol. 1127: Advances in Intelligent Systems, Computer Science and Digital Economics.: Switzerland: Springer, 2020. P. 274–298..

Procedures of sequential updating of information are important for “big data streams” processing because they avoid accumulating and storing large data sets. As a model of information accumulation, we study the Bayesian updating procedure for linear experiments. Analysis and gradual transformation of the original processing scheme in order to increase its efficiency lead to certain ...

Added: March 16, 2020

Технология диалога в современном коммуникационном пространстве

Dzyaloshinsky I. M., Pilgun M. A., Вопросы теории и практики журналистики 2014 № 5 С. 42–54.

The article describes the social, psychological, linguistic and organizational conditions for a effective dialogue, as well as features of dialogic communication in the modern media.It is shown that an open and productive dialogue is a characteristic of conscious and actively implemented by the social partnership. The article analyzes the position that modern media is undergoing ...

Added: January 4, 2015

Авторско-правовые аспекты сохранения и развития русскоязычного информационного пространства

Fedotov M., Право. Журнал Высшей школы экономики 2009 № 2 С. 31–56.

The article tracks the preservation and development the of Russian-language information space as a social and cultural phenomenon. The author studies various ways to overcome its disintegration, looks at issues connected with necessity of harmonizing two principles - freedom of information and securing intellectual property rights. He presents proposals about methods of legal regulation of ...

Added: October 15, 2012

Исследование эффективности программной реализации многопотокового алгоритма масштабирования методом билинейной интерполяции

Vnukov A., Егоров И. В., Горный информационно-аналитический бюллетень (научно-технический журнал) 2014 № 9 С. 264–271.

The article compares efficiency of sequential and parallel approaches to digital image zooming in software implementation. The method of bilinear interpolation is chosen as a sample. For the study purposes, a test routine was written in С# language. In the tests the operate time of sequential and parallel processing of the same images was compared. ...

Added: July 26, 2014

Инфраструктура управления знаниями: типы и характеристика

Викторов Е. Г., Проблемы теории и практики управления 2010 № 9 С. 23–29.

The author investigates issues related to the methodology and technology of applying the knowledge management system in an organization, describes infrastructure types needed to successfully practice a complex project of introducing an organizational, social and technological system of knowledge management. ...

Added: October 10, 2012

ЗАЩИТА КИБЕРПРОСТРАНСТВА В СТРАНАХ ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Полис. Политические исследования 2022 № 3 С. 108–123.

The adoption of a national cybersecurity strategy testifies to a country’s awareness of the importance of protecting cyber infrastructure, the digital economy and the business environment, on which information and economic well-being are already highly dependent. However, only a few Latin American countries have developed and adopted their own national cybersecurity strategy. This article analyzes ...

Added: May 25, 2022

Parallel averaging of size is possible but range-limited: A reply to Marchant, Simons, and De Fockert

Utochkin I. S., Tiurina N., Acta Psychologica 2014 Vol. 146 P. 7–18.

In their recent paper, Marchant, Simons, and De Fockert (2013) claimed that the ability to average between multiple items of different sizes is limited by small samples of arbitrarily attended members of a set. This claim is based on a finding that observers are good at representing the average when an ensemble includes only two ...

Added: October 25, 2013

The Linear Estimation Problem and Information in Big-Data Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 2 P. 73–79.

This paper addresses the problem of transforming the optimal linear estimation procedure in such a way that separate fragments of initial data are processed individually and concurrently. A representation of intermediate information is proposed that allows an algorithm to concurrently extract this information from each initial data set, combine it, and use it for estimation. ...

Added: January 23, 2019

Algebra of Information in Big Data Processing

Golubtsov P., , in: INTERNATIONAL CONFERENCE INFORMATION SYSTEMS 2017 SPECIAL INTEREST GROUP ON BIG DATA PROCEEDINGS.: Assiciation of Information Systems Electronic Library (AISel), 2017. Ch. 4 P. 1–15..

In big data problems the data usually are collected on many sites, have a huge volume, and new pieces of data are constantly generated. It is often impossible to collect all the data needed for a research project on one computer, and even impractical, since one computer would not be able to process it in ...

Added: January 23, 2019

The Concept of Information in Big Data Processing

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 1 P. 38–43.

The need to transform existing algorithms in Big Data Systems is considered. The transformation must allow independent and parallel processing of separate fragments of data. The characteristic aspects of a well-organized intermediate compact form of information and its natural algebraic properties are studied and an illustrative example is provided. ...

Added: January 23, 2019

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 4 P. 203–213.

The procedure of transition from a priori to a posteriori information for a linear experiment in the context of Big Data systems is considered. At first glance, this process is fundamentally sequential, namely: as a result of observation, a priori information is transformed into a posteriori information, which is later interpreted as a priori for ...

Added: January 18, 2019

ИНТЕРНЕТ КАК СРЕДСТВО ФОРМИРОВАНИЯ ОБЩЕСТВЕННОГО ИМИДЖА МЕЖДУНАРОДНОГО ОЛИМПИЙСКОГО ДВИЖЕНИЯ

Istyagina-Eliseeva E., Актуальные вопросы физической культуры и спорта 2000 Т. 3 С. 222–226.

Added: December 8, 2020

"Клиповое мышление": психологические дефициты и альтернативы (пространственный фокус)

Isaeva A. N., Малахова С. А., Мир психологии. Научно-методический журнал 2015 Т. 84 № 4 С. 177–191.

The article is devoted to the transformation of modern thinking becouse of development of the virtual space of life. Clip thinking is considered as phenomenon of 21st century’s «digital personality» which constitutes and formatted by media space. The paper is devoted to understanding of deficits and potentials of «clip thinking». Potential of symbolic thinking that ...

Added: November 5, 2015

Информационное пространство как новое (гео) политическое пространство: роль и место государств

Kabanov Y., Сравнительная политика 2014 № 4 С. 54–59.

Information space as a new (geo)political space exerts an increasingly higher influence on the global processes. Despite its specific nature, its connection with physical space is growing, furthermore, the states describe themselves as leading actors of information space more actively. Although the approaches diff er from one another, information space is being generally perceived in ...

Added: November 9, 2015

PROCEEDINGS 46th International Conference on Parallel Processing Workshops ICPPW 2017

Piscataway: IEEE Computer Society, 2017..

The papers in this book comprise the proceedings of the 46th International Conference on Parallel Processing Workshops — ICPPW 2017 — 14 August 2017 Bristol, United Kingdom. ...

Added: January 30, 2018

Интеллектуальный потенциал общества, информационное пространство, уголовное право

Panchenko P. N., Вопросы правоведения 2011 № 3 С. 99–113.

The article defines the concept, structure and contents of the intellectual potential of society and specifies the limits of the information space in which various crimes infringe on this potential. It also outlines the range of the said crimes and describes ways to enhance the efficiency of criminal law to counteract them. The author emphasizes ...

Added: September 21, 2012

Снижение совокупной стоимости владения информационно-аналитической системой за счет создания системы интеграции данных

Lychagin K., Pozin B., Открытое образование 2011 № 2 С. 238–242.

The paper describes categories of costs which can be included in total cost of ownership (TCO) of information-analytical system and which are connected with the data integration system (DIS). Some problems of creating DIS are described. The streaming architecture of DIS, which helps to solve problems of stability and scalability of DIS and to lower ...

Added: November 13, 2016

СТРАТЕГИИ КИБЕРБЕЗОПАСНОСТИ СТРАН ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Iberoamérica 2020 No. 1 P. 137–159.

Added: March 26, 2020

Швецов А. Н., Регион: Экономика и Социология 2012 № 4 С. 45–66.

Added: April 7, 2013

Эволюция роли медиа в доктринах информационной безопасности Российской Федерации 2000 и 2016 годов

Кочкин А. В., Коммуникации. Медиа. Дизайн 2023 Т. 8 № 3 С. 80–104.

Added: October 7, 2023

Scalability and Parallelization of Sequential Processing: Big Data Demands and Information Algebras

Added: March 16, 2020

Технология диалога в современном коммуникационном пространстве

Dzyaloshinsky I. M., Pilgun M. A., Вопросы теории и практики журналистики 2014 № 5 С. 42–54.

Added: January 4, 2015

Авторско-правовые аспекты сохранения и развития русскоязычного информационного пространства

Fedotov M., Право. Журнал Высшей школы экономики 2009 № 2 С. 31–56.

Added: October 15, 2012

Vnukov A., Егоров И. В., Горный информационно-аналитический бюллетень (научно-технический журнал) 2014 № 9 С. 264–271.

Added: July 26, 2014

Инфраструктура управления знаниями: типы и характеристика

Викторов Е. Г., Проблемы теории и практики управления 2010 № 9 С. 23–29.

Added: October 10, 2012

ЗАЩИТА КИБЕРПРОСТРАНСТВА В СТРАНАХ ЛАТИНСКОЙ АМЕРИКИ

Kosevich E., Полис. Политические исследования 2022 № 3 С. 108–123.

Added: May 25, 2022

Parallel averaging of size is possible but range-limited: A reply to Marchant, Simons, and De Fockert

Utochkin I. S., Tiurina N., Acta Psychologica 2014 Vol. 146 P. 7–18.

Added: October 25, 2013

The Linear Estimation Problem and Information in Big-Data Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 2 P. 73–79.

Added: January 23, 2019

Algebra of Information in Big Data Processing

Added: January 23, 2019

The Concept of Information in Big Data Processing

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 1 P. 38–43.

Added: January 23, 2019

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 4 P. 203–213.

Added: January 18, 2019

ИНТЕРНЕТ КАК СРЕДСТВО ФОРМИРОВАНИЯ ОБЩЕСТВЕННОГО ИМИДЖА МЕЖДУНАРОДНОГО ОЛИМПИЙСКОГО ДВИЖЕНИЯ

Istyagina-Eliseeva E., Актуальные вопросы физической культуры и спорта 2000 Т. 3 С. 222–226.

Added: December 8, 2020

"Клиповое мышление": психологические дефициты и альтернативы (пространственный фокус)

Isaeva A. N., Малахова С. А., Мир психологии. Научно-методический журнал 2015 Т. 84 № 4 С. 177–191.

Added: November 5, 2015

Информационное пространство как новое (гео) политическое пространство: роль и место государств

Kabanov Y., Сравнительная политика 2014 № 4 С. 54–59.

Added: November 9, 2015

PROCEEDINGS 46th International Conference on Parallel Processing Workshops ICPPW 2017

Piscataway: IEEE Computer Society, 2017..

The papers in this book comprise the proceedings of the 46th International Conference on Parallel Processing Workshops — ICPPW 2017 — 14 August 2017 Bristol, United Kingdom. ...

Added: January 30, 2018

Интеллектуальный потенциал общества, информационное пространство, уголовное право

Panchenko P. N., Вопросы правоведения 2011 № 3 С. 99–113.

Added: September 21, 2012

Lychagin K., Pozin B., Открытое образование 2011 № 2 С. 238–242.

Added: November 13, 2016