Big Data Normalization for Massively Parallel Processing Databases

N. Golov; Rönnbäck L.

doi:10.1007/978-3-319-25747-1_16

Publications

?

Big Data Normalization for Massively Parallel Processing Databases

Advances in Conceptual Modeling. 2015. No. 9382 of the series Lecture Notes in Computer Science. P. 154-163.

Golov N., Rönnbäck L.

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. In the one extreme, a database can be set up to provide the results of a single known query so that the use of available of resources are maximized and response time minimized, but at the cost of all other queries being suboptimally executed. In the other extreme, when no query is known in advance, the database must provide the information without such optimization, normally resulting in inefficient execution of all queries. This paper introduces a novel technique, highly normalized Big Data using Anchor modeling, that provides a very efficient way to store information and utilize resources, thereby providing ad-hoc querying with high performance for the first time in massively parallel processing databases. A case study of how this approach is used for a Data Warehouse at Avito over two years time, with estimates for and results of real data experiments carried out in HP Vertica, an MPP RDBMS, are also presented.

Research target: Computer Science

Priority areas: business informatics

Keywords: big data big data analytics MPP Database Normalization Ad-hoc Querying Performance Modeling

Big Data Normalization for Massively Parallel Processing Databases

Golov N., Rönnbäck L., Computer Standards and Interfaces 2017 Vol. 54 No. P2 P. 86-93

High performance querying and ad-hoc querying are commonly viewed as mutually exclusive goals in massively parallel processing databases. Furthermore, there is a contradiction between ease of extending the data model and ease of analysis. The modern 'Data Lake' approach, promises extreme ease of adding new data to a data model, however it is prone to ...

Added: March 13, 2017

Anticipating Future Innovation Pathways Through Large Data Analysis

Netherlands : Springer, 2016

This book aims to identify promising future developmental opportunities and applications for Tech Mining. Specifically, the enclosed contributions will pursue three converging themes: The increasing availability of electronic text data resources relating to Science, Technology & Innovation (ST&I) The multiple methods that are able to treat this data effectively and incorporate means to tap into human expertise ...

Added: June 20, 2016

Сжатие данных в хранилище больших графов

Polyakov I. V., Chepovskiy A., Chepovskiy A., Фундаментальная и прикладная математика 2016 Т. 21 № 4 С. 125-132

В статье рассматриваются методы сжатия данных для хранения графов больших размеров. Предлагаются алгоритмы препроцессинга графа специальной структуры для повышения плотности записи данных и повышения эффективности выполнения базовых операций с графами. ...

Added: December 23, 2017

Российский агропромышленный комплекс и IT: взгляд в будущее

Dorofeev A., Продовольственная безопасность 2015 № 3 С. 100-103

Modern trends in the IT perspective can contribute to the efficient development of Russian agriculture. Examples from various countries show that in adverse economic conditions information technology is a powerful tool for overcoming the crisis in the agricultural industry. ...

Added: December 17, 2015

Business Intelligence. Third European Summer School, eBISS 2013, Dagstuhl Castle, Germany, July 7-12, 2013, Tutorial Lectures

Springer, 2014

To large organizations, business intelligence (BI) promises the capability of collecting and analyzing internal and external data to generate knowledge and value, thus providing decision support at the strategic, tactical, and operational levels. BI is now impacted by the “Big Data” phenomena and the evolution of society and users. In particular, BI applications must cope ...

Added: October 17, 2014

SQL query optimization for highly normalized Big Data

Golov N., Ronnback L., Business Informatics 2015 No. 3

This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a ...

Added: August 17, 2015

PROSPECTS OF TRANSFERRIG THE LARGE VOLUMES OF RADIO ASTRONOMY DATA

Isaev E., Tarasov P. A., Odessa Astronomical Publications 2014 Vol. 27 No. 2 P. 72-73

Added: November 24, 2014

Синтез информационной системы управления подсистемами технического обеспечения интеллектуальных зданий

Vikentyeva O., Deryabin A. I., Shestakova L. V. et al., Вестник Московского государственного строительного университета 2017 Т. 12 № 10 С. 1191-1201

Subject: smart house maintenance requires taking into account a number of factors - resource conservation, mitigating working expenditures, safety enhancement, ensuring comfort of leisure and operation. Automation of such engineering systems networks as illumination, climate control, security and communication, may be achieved through utilization of contemporary technologies (e.g. IoT – Internet of Things). However, storing ...

Added: November 21, 2017

International Conference Information Systems 2016 Special Interest Group on Big Data Proceedings

Baroková A., Kryvinska N., Strauss C. et al., Dublin : Assiciation of Information Systems Electronic Library (AISel), 2016

Moving to the cloud is one such business process enabling Big Data. Cloud enables to process Big Data paying only for the time. This makes Big Data processing cost-effective in terms of both operational expenses (OpEx) and capital expenses (CapEx). ICIS 2016 SIG on BDA Proceedings include innovative techniques that companies use in business processes to deal ...

Added: March 15, 2017

The use of Big Data: A Russian perspective of personal data security

Zharova A. K., Elin V., Computer Law & Security Review 2017 Vol. 33 No. 4 P. 482-501

This article examines the impact of Big Data technology on Russian citizens' constitutional rights to a private life. There are several laws in the Russian Federation covering data privacy and protection, but these are proving inadequate to protect the citizens' rights in the face of the ever-increasing use of massive data sets and their analysis ...

Added: June 12, 2017

Proceedings of the Russian-French Workshop in Big Data and Applications. October 12–13, 2017, Moscow

M. : Higher School of Economics Publishing House, 2018

Added: January 23, 2019

INTERNATIONAL CONFERENCE INFORMATION SYSTEMS 2017 SPECIAL INTEREST GROUP ON BIG DATA PROCEEDINGS

Assiciation of Information Systems Electronic Library (AISel), 2017

International Conference on Information Systems (ICIS) is the major annual meeting of the Association for Information Systems (AIS) , which has over 4,000 members representing universities in over 95 countries worldwide. It is the most prestigious gathering of academics and practitioners in the IS discipline, and provides a forum for networking and sharing of latest ...

Added: February 1, 2018

MEDES '20: Proceedings of the 12th International Conference on Management of Digital EcoSystems

NY : Association for Computing Machinery (ACM), 2020

Nowadays, we live in a world, in which independent entities such as individuals, organizations, services, software, and applications sharing one or several missions and focusing on the interactions and inter-relationships among them, are strongly connected. This situation gives rise to an intelligent environment namely «digital ecosystem». This latter includes different producers of data (the Web, ...

Added: February 5, 2021

Decomposing Petri Nets for Process Mining: A Generic Approach

van der Aalst W., Distributed and Parallel Databases 2013 Vol. 31 No. 4 P. 471-507

The practical relevance of process mining is increasing as more and more event data become available. Process mining techniques aim to discover, monitor and improve real processes by extracting knowledge from event logs. The two most prominent process mining tasks are: (i) process discovery: learning a process model from example behavior recorded in an event ...

Added: November 14, 2013

Труды ХVIII международной конференции DAMDID / RSDL’2016, 11-14 октября 2016, Ершово, Московская область, Россия

НИЯУ МИФИ, 2016

In 2016 the International Conference “Data Analytics and Management in Data Intensive Domains” (DAMDID/RCDL’2016) was held on October 11 – 14 in the Holiday Center, Ershovo (Moscow region). By tradition the “Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research, promoting ...

Added: January 26, 2017

Efficient Exact Algorithm for Count Distinct Problem

Golov N., Filatov A., Bruskin S., , in : 21st International Conference on Computer Algebra in Scientific Computing (CASC-2019). : Springer, 2019. Ch. 11661. P. 67-77.

This paper describes and analyses optimization approaches, which make possible the exact calculation of millions of hierarchical count distinct measures over hundreds of billions data rows. Described approach evolved for several years, in parallel with the growth of tasks from a fast growing internet company, and was finally implemented as a PEAPM (Pipelined Exact Accumulation ...

Added: July 1, 2019

Human Rights on the Internet: Legal Frames and Technological Implications: Compendium on Internet governance. Volume 3

M. : Higher School of Economics Publishing House, 2014

This compendium comprises transcripts of the two workshops on 'Empowering displaced people and migrants through online services' and 'Free Software and Human Rights on the Internet' organized by the Higher School of Economics on the 8th Internet Governance Forum (Bali, Indonesia, 22–25 October, 2013) and relevant articles on legal and technological issues of Internet Governance ...

Added: January 30, 2015

Технологии и инфраструктура Big Data

Radchenko I., Николаев И. Н., СПб. : Университет ИТМО, 2018

В учебном пособии в сжатой форме излагаются основные принципы, подходы и направления технологий и инфраструктуры Big Data. Авторы дают краткий обзор подходов и определений, предоставляют обзор экосистемы Больших данных и раскрывают тему систем управления Большими данными. В учебном пособии также представлен краткий обзор областей применения Больших данных и архитектура системы обработки Больших данных. Отдельно рассказывается ...

Added: September 29, 2018

Большие данные в биоинформатике

Назипова Н. Н., Isaev E., Kornilov V. et al., Математическая биология и биоинформатика 2017 Т. 12 № 1 С. 102-119

Секвенирование человеческого генома началось в 1994 году. Понадобилось 10 лет работы многих научных коллективов для того, чтобы получить черновую последовательность ДНК человека. Современные технологии секвенирования позволяют получать геном конкретного человека за несколько дней. Обсуждаются успехи современной биоинформатики, связанные с появлением высокопроизводительных платформ секвенирования, которые не только способствовали расширению возможностей различных направлений биологии и других смежных ...

Added: March 3, 2017

2020 IEEE International Conference on Big Data (Big Data 2020)

IEEE, 2020

The IEEE BigData conference series is sponsored by the IEEE Computer Society and attracts high-quality original research papers on various aspects of big data. This year, we received 535 full paper submissions from 2049 authors and co-authors of 58 countries. After a rigorous peer review process undertaken by the program committee members, 84 regular papers ...

Added: April 16, 2021

Передача, хранение и обработка больших объемов научных данных

Grigorev A., Isaev E., Тарасов П. А., М. : ИНФРА-М, 2020

This tutorial discusses large scientific projects and the volumes of data generated by them, provides an overview of scientific computer networks that allow high-speed transmission of large amounts of data for these projects; computing systems offered by leading manufacturers of computer equipment for processing large amounts of data, and providing both the ability to ...

Added: November 10, 2019

The Transition from A Priori to A Posteriori Information: Bayesian Procedures in Distributed Large-Scale Data Processing Systems

Golubtsov P., Automatic Documentation and Mathematical Linguistics 2018 Vol. 52 No. 4 P. 203-213

The procedure of transition from a priori to a posteriori information for a linear experiment in the context of Big Data systems is considered. At first glance, this process is fundamentally sequential, namely: as a result of observation, a priori information is transformed into a posteriori information, which is later interpreted as a priori for ...

Added: January 18, 2019

Supplementary Proceedings ICFCA 2019 Conference and Workshops

CEUR Workshop Proceedings, 2019

Added: October 31, 2019

INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS 2019 SPECIAL INTEREST GROUP ON BIG DATA PROCEEDINGS

[б.и.], 2019

Added: February 12, 2020