Core Data Analysis: Summarization, Correlation, and Visualization

B. Mirkin

doi:10.1007/978-3-030-00271-8

Publications

?

Core Data Analysis: Summarization, Correlation, and Visualization

Springer, 2019.

Mirkin B.

This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization.
The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent.
Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go.
Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

DOI

Text on another site

Keywords: regression analysis data analysis Data Science Singular Value Decomposition Cluster analysis

Publication based on the results of:

A study of models of decision-making and analysis of complex structured data (2019)

Core Data Analysis: Summarization, Correlation, and Visualization

EVIDENCE FOR QUASI-ADIABATIC MOTION OF CHARGED PARTICLES IN STRONG CURRENT SHEETS IN THE SOLAR WIND

Malova H. V., V. Yu. Popov, Grigorenko E. E. et al., Astrophysical Journal 2017 Vol. 834 No. 34 P. 1–9

We investigate quasi-adiabatic dynamics of charged particles in strong current sheets (SCSs) in the solar wind, including the heliospheric current sheet (HCS), both theoretically and observationally. A self-consistent hybrid model of an SCS is developed in which ion dynamics is described at the quasi-adiabatic approximation, while the electrons are assumed to be magnetized, and their ...

Added: February 15, 2017

Measurement of the B0s→μ+μ−Bs0→μ+μ− branching fraction and effective lifetime and search for B0→μ+μ−B0→μ+μ− decays

Borisyak M., Ustyuzhanin A., Ratnikov F. et al., Physical Review Letters 2017 Vol. 118 No. 19 P. 191801-1–191801-11

A search for the rare decays Bs0→μ+μ- and B0→μ+μ- is performed at the LHCb experiment using data collected in pp collisions corresponding to a total integrated luminosity of 4.4 fb-1. An excess of Bs0→μ+μ- decays is observed with a significance of 7.8 standard deviations, representing the first observation of this decay in a single experiment. The ...

Added: October 22, 2017

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings

Springer, 2021

This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic. The 27 full papers and 4 short papers presented ...

Added: October 7, 2020

Интегрированная система баз данных по свойствам неорганических веществ и материалов и ее использование для компьютерного конструирования новых соединений

Киселева Н. Н., Дударев В.А., Столяренко А. В., Вестник Казанского технологического университета 2014 Т. 17 № 19 С. 372–376

Integrated system (IS) of databases (DB) on the properties of inorganic substances and materials that includes DBs developed by Baikov Institute (Russia) and NIMS (Japan) was created. The service-oriented architecture on the basis of Web-services application for support of interaction between heterogeneous information systems was used for DBs integrations. The special metabase is used for ...

Added: January 20, 2016

14th International Conference on Formal Concept Analysis - Supplementary Proceedings

University Rennes 1, 2017

This volume is the supplementary volume of the 14th International Conference on Formal Concept Analysis (ICFCA 2017), held from June 13th to 16th 2017, at IRISA, Rennes. The ICFCA conference series is one of the major venues for researches from the field of Formal Concept Analysis and related areas to present and discuss their recent ...

Added: June 19, 2017

IOP CONFERENCE SERIES: MATERIALS SCIENCE AND ENGINEERING. 1st International Conference on Innovative Informational and Engineering Technologies (IIET-2020) 28-29 May 2020, Stavropol, Russian Federation

Сахнюк П. А., Bristol: IOP Publishing, 2020

The article discusses the possibilities of studying the state of the social sphere according to the repository of the Moscow Government open data portal by administrative districts and city districts using Business Intelligence Platforms and Data Science and Machine Learning Platforms intellectual technologies. Opportunities are presented for using machine learning technologies for business analytics platforms ...

Added: December 8, 2020

A density-based statistical analysis of graph clustering algorithm performance

Miasnikof P., Shestopaloff A. Y., Bonner A. J. et al., Journal of Complex Networks 2020 Vol. 8 No. 3 P. 1–33

We introduce graph clustering quality measures based on comparisons of global, intra- and inter-cluster densities, an accompanying statistical significance test and a step-by-step routine for clustering quality assessment. Our work is centred on the idea that well-clustered graphs will display a mean intra-cluster density that is higher than global density and mean inter-cluster density. We ...

Added: August 4, 2020

Фильтрация сембланса при обработке записей волнового акустического каротажа

Akhmetsafina R., Ахметсафин Р. Д., Известия высших учебных заведений. Приборостроение 2019 Т. 62 № 6 С. 503–510

Semblance or slowness time coherence is a measure of the coherence of energy distribution be-tween recorded signals at antenna array receivers of acoustic wave logging probe in the coordinates "the reduced time of the wave path from the middle of the antenna array” — “interval time". Several semblance filtering methods are proposed to allow for ...

Added: October 1, 2019

Search for the decays B0s→τ+τ−Bs0→τ+τ− and B0→τ+τ−

Borisyak M., Ratnikov F., Ustyuzhanin A., Physical Review Letters 2017 Vol. 118 No. 25 P. 251802–251812

A search for the rare decays Bs0→τ+τ- and B0→τ+τ- is performed using proton–proton collision data collected with the LHCb detector. The data sample corresponds to an integrated luminosity of 3 fb-1 collected in 2011 and 2012. The τ leptons are reconstructed through the decay τ-→π-π+π-ντ. Assuming no contribution from B0→τ+τ- decays, an upper limit is set ...

Added: October 22, 2017

Core concepts in data analysis: summarization, correlation, visualization (Undergraduate topics in Computer Science)

Mirkin B., L.: Springer, 2011

This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting ...

Added: January 11, 2014

Traffic and Granular Flow '11

Berlin: Springer, 2013

This book continues the biannual series of conference proceedings, which has become a classical reference resource in traffic and granular research alike. It addresses new developments at the interface between physics, engineering and computational science. Complex systems, where many simple agents, be they vehicles or particles, give rise to surprising and fascinating phenomena. The contributions collected ...

Added: March 17, 2015

Proceedings of the 2016 Future Technologies Conference

IEEE, 2017

Added: November 20, 2017

Тезисы докладов 11-й конференции Интеллектуализация обработки информации

М.: Торус Пресс, 2016

This proceedings contains the abstracts of papers accepted to IDP-11 ...

Added: November 12, 2016

Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI at IJCAI 2013)

Beijing: CEUR Workshop Proceedings, 2013

This is the second edition of the FCA4AI workshop, the first edition being associated to the ECAI 2012 Conference, held in Montpellier, in August 2012 (see http://www.fca4ai.hse.ru/). In particular, the first edition of the workshop showed that there are many AI researchers interested in FCA. Based on that, the three co-editors decided to organize a ...

Added: October 26, 2014

Book of Abstracts of the 15th Applied Stochastic Models and Data Analysis International Conference (ASMDA 2013), 25-28 june 2013, Mataro (Barcelona), Spain

ISAST: International Society for the Advancement of Science and Technology, 2013

Added: September 9, 2013

SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data

NY: ACM, 2021

The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. The conference includes a fascinating technical program with research and industrial talks, tutorials, demos, and focused workshops. It also hosts a poster session to learn about innovative ...

Added: April 28, 2021

Описание статистического анализа данных в оригинальных статьях. Типичные ошибки

Rebrova O., Российская ринология 2018 Т. 26 № 1 С. 65–68

Description of the statistical analysis of the data contained in original articles. Typical mistakes ...

Added: October 2, 2018

MLDev: Data Science Experiment Automation and Reproducibility Software

Khritankov Anton, Pershin N., Uhov N. et al., / arXiv.org. Series arXiv:2107.12322 "Computer Science > Machine Learning". 2021.

In this paper we explore the challenges of automating experiments in data science. We propose an extensible experiment model as a foundation for integration of different open source tools for running research experiments. We implement our approach in a prototype open source MLDev software package and evaluate it in a series of experiments yielding promising ...

Added: October 6, 2021

Using process mining for the analysis of an e-trade system: A case study

Alexey Mitsyuk, Anna Kalenkova, Sergey A. Shershakov et al., Business Informatics 2014 Vol. 29 No. 3 P. 15–27

E-trade systems are widely used to automate sales processes. Inefficiencies and bottlenecks in the sales processes lead to business losses. Conventional approaches to identifying problems require much time and result in subjective conclusions. This paper proposes an approach for the analysis of e-trade system processes based on the application of process mining techniques. Process mining ...

Added: August 29, 2014

Book of Abstracts of the 17th Applied Stochastic Models and Data Analysis International Conference with Demographics workshop (ASMDA 2017), 6-9 june 2017, London, UK

ISAST: International Society for the Advancement of Science and Technology, 2017

Added: September 5, 2017

Measurement of the Λ0b→J/ψΛ angular distribution and the Λ0b polarisation in pp collisions

The LHCb C., Boldyrev A., Derkach D. et al., Journal of High Energy Physics 2020 Vol. 2020 No. 6 Article 110

This paper presents an analysis of the Λ0𝑏Λb0→ J/ψΛ angular distribution and the transverse production polarisation of Λ0𝑏Λb0 baryons in proton-proton collisions at centre-of-mass energies of 7, 8 and 13 TeV. The measurements are performed using data corresponding to an integrated luminosity of 4.9 fb−1, collected with the LHCb experiment. The polarisation is determined in a fiducial region ...

Added: September 17, 2020

Constructing an Efficient Machine Learning Model for Tornado Prediction

Aleskerov F. T., Demin S., Richman M. et al., International Journal of Information Technology and Decision Making 2020 Vol. 19 No. 5 P. 1177–1187

Tornado prediction variables are analyzed using machine learning and decision analysis techniques. A model based on several choice procedures and the superposition principle is applied for different methods of data analysis. The constructed model has been tested on a database of tornadic events. It is shown that the tornado prediction model developed herein is more ...

Added: October 22, 2019

Overlapping community detection in networks based on link partitioning and partitioning around medoids

Ponomarenko A., Pitsoulis L., Shamshetdinov M., Plos One 2021 Vol. 16 No. 8 Article e0255717

In this paper, we present a new method for detecting overlapping communities in net- works with a predefined number of clusters called LPAM (Link Partitioning Around Medoids). The overlapping communities in the graph are obtained by detecting the disjoint communities in the associated line graph employing link partitioning and parti- tioning around medoids which are ...

Added: December 9, 2020

Анализ времени достижения консенсуса в рамках деятельности ТК

Maksimova O., Аронов И. З., Зажигалкин А. В., Стандарты и качество 2015 № 7 С. 16–18

• What factors influence the achievement of consensus? • What degree of authoritarianism members of the Technical Committee for Standardization (TC) called a ska-on time to reach consensus? • What should be taken into account to improve performance management of TC? ...

Added: June 30, 2015