Least-squares community extraction in feature-rich networks using similarity data
We explore a doubly-greedy approach to the issue of community detection in feature-rich networks. According to this approach, both the network and feature data are straightfor- wardly recovered from the underlying unknown non-overlapping communities, supplied with a center in the feature space and intensity weight(s) over the network each. Our least- squares additive criterion allows us to search for communities one-by-one and to find each community by adding entities one by one. A focus of this paper is that the feature-space data part is converted into a similarity matrix format. The similarity/link values can be used in either of two modes: (a) as measured in the same scale so that one may can meaningfully compare and sum similarity values across the entire similarity matrix (summability mode), and (b) similarity values in one column should not be compared with the values in other columns (nonsummability mode). The two input matrices and two modes lead us to developing four different Iterative Community Extraction from Similarity data (ICESi) algorithms, which determine the number of communities automatically. Our experiments at real-world and synthetic datasets show that these algorithms are valid and competitive.
This state-of-the-art survey is dedicated to the memory of Emmanuil Markovich Braverman (1931-1977), a pioneer in developing the machine learning theory. The 12 revised full papers and 4 short papers included in this volume were presented at the conference "Braverman Readings in Machine Learning: Key Ideas from Inception to Current State" held in Boston, MA, USA, in April 2017, commemorating the 40th anniversary of Emmanuil Braverman's decease. The papers present an overview of some of Braverman's ideas and approaches. The collection is divided in three parts. The first part bridges the past and the present. Its main contents relate to the concept of kernel function and its application to signal and image analysis as well as clustering. The second part presents a set of extensions of Braverman's work to issues of current interest both in theory and applications of machine learning. The third part includes short essays by a friend, a student, and a colleague.
The mass application of mobile cardiographs already leads to both explosive quantitative growth of the number of patients available for ECG study, registered daily outside the hospital (Big DATA in cardiology), and to the emergence of new qualitative opportunities for the study of long-term oscillatory processes (weeks, months, years) of the dynamics of the individual state of the Cardiovascular system of any patient.
The article demonstrates that new opportunities of long - term continuous monitoring of the Cardiov ascular system state of patients ' mass allow to reveal the regularities (DATA MINING) of Cardiovascular system dynamics, leading to the hypothesis of the existence of an adequate Cardiovascular system model as a distributed nonlinearself - oscillating system of the FPU recurrence model class . The presence of a meaningful mathematical model of Cardiovascular system within the framework of the FPU auto – recurrence , as a refinement of the traditional model of studying black box, further allows us to offer new computational methods for ECG analysis and prediction of Cardiovascular system dynamics for a refined diagnosis and evaluation of the effectiveness of the treatment.
The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.
Using network approach, we propose a new method of identifying key food exporters based on the long-range (LRIC) and short-range interaction indices (SRIC). These indices allow to detect several groups of economies with direct as well as indirect influence on the routes of different levels in the food network.
Trading processes is a vital part of human life and any unstable situation results in the change of living conditions of individuals. We study the power of each country in terms of produce trade. Trade relations between countries are represented as a network, where vertices are territories and edges are export flows. As flows of products between participants are heterogeneous we consider various groups of substitute goods (cereals, fish, vegetables). We detect key participants affecting food retail with the use of classical centrality measures. We also perform clustering procedure in order to find communities in networks.
This article represents a new technique for collaborative filtering based on pre-clustering of website usage data. The key idea involves using clustering methods to define groups of different users.
The contributions in this volume cover a broad range of topics including maximum cliques, graph coloring, data mining, brain networks, Steiner forest, logistic and supply chain networks. Network algorithms and their applications to market graphs, manufacturing problems, internet networks and social networks are highlighted. The "Fourth International Conference in Network Analysis," held at the Higher School of Economics, Nizhny Novgorod in May 2014, initiated joint research between scientists, engineers and researchers from academia, industry and government; the major results of conference participants have been reviewed and collected in this Work. Researchers and students in mathematics, economics, statistics, computer science and engineering will find this collection a valuable resource filled with the latest research in network analysis.
This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. Visualization, in this context, is a way of presenting results in a cognitively comfortable way. The term summarization is understood quite broadly here to embrace not only simple summaries like totals and means, but also more complex summaries such as the principal components of a set of features or cluster structures in a set of entities.
The material presented in this perspective makes a unique mix of subjects from the fields of statistical data analysis, data mining, and computational intelligence, which follow different systems of presentation.
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
This article is talking about state management and cultural policy, their nature and content in term of the new tendency - development of postindustrial society. It mentioned here, that at the moment cultural policy is the base of regional political activity and that regions can get strong competitive advantage if they are able to implement cultural policy successfully. All these trends can produce elements of new economic development.