A Comparison of the Missing-Indicator Method and Complete Case Analysis in Case of Categorical Data
Global tendency to democratization of the last decades is interested by a lot of researchers and politicians. While the significant part of the studies is dedicated to indices’ validity is concentrated on measurement that are used in broad cross-country surveys, Professor of Oxford University Stein Ringen pays attention to the insufficient consideration of system characteristics of democracy and suggests to investigate individual people’ perception the level of democracy in their own country (at the same time he doesn’t offer any empirical database). In this research the author by means of regression and correlation analysis concludes that not all indices of democracy can be called valid, in particular, the most invalid index is the widespread index Polity IV.
It is commonly the case in multi-modal pattern recognition that certain modality-specific object features are missing in the training set. We address here the missing data problem for kernel-based Support Vector Machines, in which each modality is represented by the respective kernel matrix over the set of training objects, such that the omission of a modality for some object manifests itself as a blank in the modality-specific kernel matrix at the relevant position. We propose to fill the blank positions in the collection of training kernel matrices via a variant of the Neutral Point Substitution (NPS) method, where the term ”neutral point” stands for the locus of points defined by the ”neutral hyperplane” in the hypothetical linear space produced by the respective kernel. The current method crucially differs from the previously developed neutral point approach in that it is capable of treating missing data in the training set on the same basis as missing data in the test set. It is therefore of potentially much wider applicability. We evaluate the method on the Biosecure DS2 data set.
The authors analyzed the population life quality of some regions in Russian Federation with using of multivariate statistical analysis. The authors found that increasing population life quality, in particular, increasing life expectancy can be achieved by adjusting the demographic indicators, cash income, development of health, social and environmental security in the Volga Federal District. While in the municipalities of the Republic of Mari El the growth of employment, wages, migration and natural population growth, the number of doctors and the number of inputs houses, reducing proportion of dilapidated housing and reduced mortality improved the quality o f life and increase fertility.
This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization. The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent. Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go. Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.
The article explores the procedural aspect of constructing structural and logical typologies with the aim of creating the innovation index - workers attitudes guiding innovation and innovation -related behavior at workplace.
This paper presents a preliminary analysis of hotel room prices in several European cities based on the data from Booking.com website. The main question raised in the study is whether early booking is advantageous indeed, and if so, how early should it be? First a script was developed to download more than 600 thousand hotel offers for reservations from 25 March 2013 to 17 March 2014. Then an attempt to discover more details concerning the early booking effect was made via basic statistics, graphical data representation and hedonic pricing analysis. It was revealed that making reservations in advance can be really gainful, although more data and research are needed to measure the exact numbers, as they depend on at least seasonality and city.