Preface to the special issue on “Clustering and search techniques in large scale networks”
Clustering and search techniques are essential to a wide spectrum of applications. Network clustering techniques are becoming common in the analysis of massive data sets arising in various branches of science, engineering, government and industry. In particular, network clustering and search techniques emerge as an important tool in large-scale networks.
This special issue of Optimization Letters contains refereed selected papers presented at the workshop on clustering and search techniques in large scale networks that took place on November 3–8, 2014 at the Laboratory of Algorithms and Technologies for Networks Analysis (LATNA), Higher School of Economics, in Nizhny Novgorod, Russia. The workshop was supported by the Russian Science Foundation Grant RSF 14-41-00039.
This workshop provided a forum for leading as well as beginning researchers and students to discuss recent advances and identify current and future challenges arising in research concerning clustering and search problems in large networks. The papers of this special issue reflect some the problems discussed at the workshop.
We would like to thank the valuable work of authors and reviewers for making this issue possible.
This state-of-the-art survey is dedicated to the memory of Emmanuil Markovich Braverman (1931-1977), a pioneer in developing the machine learning theory. The 12 revised full papers and 4 short papers included in this volume were presented at the conference "Braverman Readings in Machine Learning: Key Ideas from Inception to Current State" held in Boston, MA, USA, in April 2017, commemorating the 40th anniversary of Emmanuil Braverman's decease. The papers present an overview of some of Braverman's ideas and approaches. The collection is divided in three parts. The first part bridges the past and the present. Its main contents relate to the concept of kernel function and its application to signal and image analysis as well as clustering. The second part presents a set of extensions of Braverman's work to issues of current interest both in theory and applications of machine learning. The third part includes short essays by a friend, a student, and a colleague.
Formal concepts and closed itemsets proved to be of big importance for knowledge discovery, both as a tool for concise representation of association rules and a tool for clustering and constructing domain taxonomies and ontologies. Exponential explosion makes it difficult to consider the whole concept lattice arising from data, one needs to select most useful and interesting concepts. In this paper interestingness measures of concepts are considered and compared with respect to various aspects, such as efficiency of computation and applicability to noisy data and performing ranking correlation.
This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. Visualization, in this context, is a way of presenting results in a cognitively comfortable way. The term summarization is understood quite broadly here to embrace not only simple summaries like totals and means, but also more complex summaries such as the principal components of a set of features or cluster structures in a set of entities.
The material presented in this perspective makes a unique mix of subjects from the fields of statistical data analysis, data mining, and computational intelligence, which follow different systems of presentation.
The market of telecommunications services is one of the most important and promising sectors of Russian economics, and its development has an essential impact on development strategies of all industries. In recent times, we observe a tendency for the operators’ business to shift from providing communications services to supplying integrated ICT services. A positive trend line of market growth is predicted for the coming five years. However, the problem of keeping and even expanding the subscriber base is an ongoing task of all telecom companies. One of the possible solutions to this problem is developing a rational tariff policy, which may take into consideration not only the interests of the company and its investors, but also the subscribers’ preferences. One of the main components of the tariff policy is developing new tariff plans, which meet the afore-mentioned requirements.
In the paper, a new concept of tariff plan development is proposed. It is based on identifying stable groups of existing tariff plans and subscribers’ preferences that are non-linearly related with tariff plan characteristics. The proposed method is based on the concept of client lifetime value (CLV) that characterizes discounted profit received from a customer during all the time he consumes services from the company. This approach gives us an opportunity to build-up a CLV forming model, relying on subscriber’s consumption of mobile services and price characteristics of tariff plans. This seems quite important in the conditions of volatility of the high tech market and intensive changes in patterns of subscribers’ consumption of services.
Within the proposed concept, an info-logical model for developing and evaluating a new tariff plan is developed. The model is based on the synthesis of neural networks and genetic algorithm. The proposed model allows us to make assessment of combinations of tariff plans’ price characteristics created by telecom company specialists, and to determine an optimal combination representing local or global maximum of CLV in the given time interval. This may be done for each subscriber’s consumption profile and for the given period.
The approach gives us an opportunity to choose a tariff plan (from existing and newly created tariffs) for every subscriber cluster, which satisfies subscribers and investor preferences while providing maximum company profit.
This study is devoted to different types of students’ behavior before they drop an adaptive course. The Adaptive Python course at the Stepik educational platform was selected as the case for this study. Student behavior was measured by the following variables: number of attempts for the last lesson, last three lessons solving rate, the logarithm of normed solving time, the percentage of easy and difficult lessons, the number of passed lessons, and total solving time. We applied a standard clustering technique, K-means, to identify student behavior patterns. To determine optimal number of clusters, the silhouette metrics was used. As the result, three types of dropout were identified: “solved lessons”, “evaluated lessons as hard’’, and “evaluated lessons as easy”.
This book constitutes the proceedings of the 23rd International Symposium on Foundations of Intelligent Systems, ISMIS 2017, held in Warsaw, Poland, in June 2017. The 56 regular and 15 short papers presented in this volume were carefully reviewed and selected from 118 submissions. The papers include both theoretical and practical aspects of machine learning, data mining methods, deep learning, bioinformatics and health informatics, intelligent information systems, knowledge-based systems, mining temporal, spatial and spatio-temporal data, text and Web mining. In addition, four special sessions were organized; namely, Special Session on Big Data Analytics and Stream Data Mining, Special Session on Granular and Soft Clustering for Data Science, Special Session on Knowledge Discovery with Formal Concept Analysis and Related Formalisms, and Special Session devoted to ISMIS 2017 Data Mining Competition on Trading Based on Recommendations, which was launched as a part of the conference.
Modern internet technologies open a wide range of opportunities for enterprises: keeping accounts online, connecting with customers from different locations, collecting and analyzing data about their target audience and other advantages. One of the actively explored factors related to the potential success is using the Internet tools for projects presentation. The aim of this study is to identify the network distinctive patterns forming the strategies for running and maintaining an online shop’s profile on Russian social networking site vk.com. We collected data about 706 e-shops profiles on vk.com including their descriptions, information about the communities followers and posts on profile wall. For each profile we built an ego graph of followers network and calculated its centrality measures which were further used to run the k-means clustering algorithm. As a result, we identified six distinct clusters which we assume will approximate different strategies of maintaining an e-shop. These clusters differed in terms of important profile features such as community’s audience size, posting activity, followers network connectivity, the presence of “hubs”, e-shops operating mostly on vk.com or having an external head website. Considering the network-structure patterns as a result of an online shop’s formed strategy, the potential success can be estimated. Taking a monthly number of visits to a website from vk.com as a success metrics, it turns out that the centrality’s indicators themselves and generalized clusters have associations with a site-visiting frequency.
Конференция Computer Science уровня A* по рейтингу CORE
Clustering data with both continuous and discrete attributes is a challenging task. Existing methods lack a principled probabilistic formulation. In this paper, we propose a clustering method based on a tree-structured graphical model to describe the generation process of mixed-type data. Our tree-structured model factorized into a product of pairwise interactions, and thus localizes the interaction between feature variables of different types. To provide a robust clustering method based on the tree-model, we adopt a topographical view and compute peaks of the density function and their attractive basins for clustering. Furthermore, we leverage the theory from topology data analysis to adaptively merge trivial peaks into large ones in order to achieve meaningful clusterings. Our method outperforms state-of-the-art methods on mixed-type data.
This paper explores age-specific migration flows between regions of Russia. Using age-disaggregated data of the Russian Census 2010, we cluster interregional migration flows based on prevailing age-groups of migrants, analyse diversity and similarity in the choice of age-specific migration destinations and describe general socio-economic characteristics of these flows. It is for the first time that the relationship between migration and migrants’ age and life-cycle events is analysed in the Russian context. Similar to migrants in other countries, migrants in Russia choose the place of residence depending on their age. Migration flows which differ by dominating age group of migrants quite often have opposite destinations, because motivations of migration also differ. Migration follows various stages of the life-cycle: people are born in one region, study in another region, go to work in a different region, and resettle to another place after retirement. Migration modeling turns to be complicated if the impact of age factor is ignored. Therefore, the age of migrants should be considered when analyzing, modeling and interpreting interregional migration in Russia.