Глава
Detection of an unspecified number of communities in feature-rich networks
В книге
Рассматривается способ улучшения производительности рекомендательных систем при помощи предварительного выделения групп пользователей с похожим поведением. Для разбиения пользователей на группы используются распределенная версия алгоритма k-средних и алгоритм canopy для определения начальных центроидов.
This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. Visualization, in this context, is a way of presenting results in a cognitively comfortable way. The term summarization is understood quite broadly here to embrace not only simple summaries like totals and means, but also more complex summaries such as the principal components of a set of features or cluster structures in a set of entities.
The material presented in this perspective makes a unique mix of subjects from the fields of statistical data analysis, data mining, and computational intelligence, which follow different systems of presentation.
Contributions in this volume focus on computationally efficient algorithms and rigorous mathematical theories for analyzing large-scale networks. Researchers and students in mathematics, economics, statistics, computer science and engineering will find this collection a valuable resource filled with the latest research in network analysis. Computational aspects and applications of large-scale networks in market models, neural networks, social networks, power transmission grids, maximum clique problem, telecommunication networks, and complexity graphs are included with new tools for efficient network analysis of large-scale networks.
This proceeding is a result of the 7th International Conference in Network Analysis, held at the Higher School of Economics, Nizhny Novgorod in June 2017. The conference brought together scientists, engineers, and researchers from academia, industry, and government.
The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Using network approach, we propose a new method of identifying key food exporters based on the long-range (LRIC) and short-range interaction indices (SRIC). These indices allow to detect several groups of economies with direct as well as indirect influence on the routes of different levels in the food network.
Trading processes is a vital part of human life and any unstable situation results in the change of living conditions of individuals. We study the power of each country in terms of produce trade. Trade relations between countries are represented as a network, where vertices are territories and edges are export flows. As flows of products between participants are heterogeneous we consider various groups of substitute goods (cereals, fish, vegetables). We detect key participants affecting food retail with the use of classical centrality measures. We also perform clustering procedure in order to find communities in networks.