Сравнительный анализ результатов кластеризации распределений баллов ЕГЭ по математике в регионах РФ в 2010-2011 гг. и модель учета ряда объективных факторов в формировании кластеров
A brief overview of the results of the classification of the RF-regions by the distributions of the EGE-scores in 2011 year is presented. The comparison analysis of these results with the results in 2010 year is made. The attempt to establish the factors that explain the variation of the region's distributions is made. The multiple linear regression for the average score under these factors is build.
The article elaborates on composing and designing own corpora that would represent certain types of discourse, it also reviews implementation of available corpus software to identify a text or a genre specific key words, looks at corpus tools to identify and measure collocation strength using large national corpora.
This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization. The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent. Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go. Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.
The contemporary marketing practices methodology (CMP) attracts attention of a substantial number of researchers in the field of strategic marketing. In the past two decades there were more than fifty papers published in peer-reviewed outlets addressing the analytics of usage of contemporary marketing practices in a variety of countries and industries. In this note we discuss reliability of these studies with respect to the usage of specific analytic tools. First, we demonstrate that standard clustering analysis is relatively sensitive to small changes in the datasets with companies being assigned to different clusters at frequent rates. Second, the project national teams make use of different, often incompatible settings. Therefore, to make possible comparisons between the countries and across industries, the researchers must agree on a generic setup and procedures. We conclude the note sketching the basics of these common grounds.
The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Existing approaches require the number of communities pre-specified. We apply the so-called data recovery approach to allow a relaxation of the criterion for finding communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.
Authors suggests some advices in the field of client base segmentation construction for retail profit-making organizations concerning their possible reaction on marketing campaigns. Advices are based on the results of research in one of the largest Russian retail network in the segment of mobile devices.
The main goal of this paper is to study interconnections between credit ratings and financial indicators of industrial companies from BRICS countries. We use method of patterns, one of the modern methods of nonlinear modeling, to identify groups of heterogeneous objects with different influence on ratings. Additionally, in this research, we evaluate Tobit regression model for selected groups and establish some credit rating patterns for the BRICS industrial companies. Our results of Tobin model, may have practical implementation in short-term financial management.