Summable and nonsummable data‐driven models for community detection in feature‐rich networks
A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.
MARAMI 2020 Modèles & Analyse des Réseaux : Approches Mathématiques & Informatiques - Network Modeling and Analysis 2020
Proceedings of MARAMI 2020 - Modèles & Analyse des Réseaux : Approches Mathématiques & Informatiques - The 11th Conference on Network Modeling and Analysis
Virtual Conference, October 14-15, 2020.
CIRAD, UMR Tetis, Montpellier, France TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France
Multimodal clustering is an unsupervised technique for mining interesting patterns in n-ary relations or n-mode networks. Among different types of such generalised patterns one can find biclusters and formal concepts (maximal bicliques) for two-mode case, triclusters and triconcepts for three-mode case, closed n-sets for n-mode case, etc. Object-attribute biclustering (OA-biclustering) for mining large binary datatables (formal contexts or two-mode networks) arose by the end of the last decade due to intractability of computation problems related to formal concepts; this type of patterns was proposed as a meaningful and scalable approximation of formal concepts. In this paper, our aim is to present recent advance in OA-biclustering and its extensions to mining multi-mode communities in SNA setting. We also discuss connection between clustering coefficients known in SNA community for one-mode and two-mode networks and OA-bicluster density, the main quality measure of an OA-bicluster. Our experiments with two-, three-, and four-mode large real-world networks show that this type of patterns is suitable for community detection in multi-mode cases within reasonable time even though the number of corresponding n-cliques is still unknown due to computation difficulties. An interpretation of OA-biclusters for one-mode networks is provided as well.
The purpose of the study: analysis of the graph of interacting objects of social networks based on the selection of implicit communities, assessment of the subjectivity of the selected communities and comparison of the network characteristics of communities and various indicators of their subjectivity.
Method: communities detection on the constructed weighted graph of a social network, psycholinguistic analysis of community content using a list of discourse markers of subjectivity, statistical methods for identifying the relationship between network characteristics and the frequency of discourse markers.
Results: algorithms to construct a graph and to import user attributes were developed, an algorithm for dividing a weighted graph into implicit user communities was implemented, the subjectivity of the content of the selected network communities in the social network Twitter has was assessed, the relationship and directional shift in the connectivity of the graph and various indicators of the subjectivity of the network community were identified.
In this article, our ultimate goal is to transform a graph’s adjacency matrix into a distance matrix. Because cluster density is not observable prior to the actual clustering, our goal is to find a distance whose pairwise minimisation will lead to densely connected clusters. Our thesis is centred on the widely accepted notion that strong clusters are sets of vertices with high induced subgraph density. We posit that vertices sharing more connections are closer to each other than vertices sharing fewer connections. This definition of distance differs from the usual shortest-path distance. At the cluster level, our thesis translates into low mean intra-cluster distances, which reflect high densities. We compare three distance measures from the literature. Our benchmark is the accuracy of each measure’s reflection of intra-cluster density, when aggregated (averaged) at the cluster level. We conduct our tests on synthetic graphs, where clusters and intra-cluster density are known in advance. In this article, we restrict our attention to unweighted graphs with no self-loops or multiple edges. We examine the relationship between mean intra-cluster distances and intra-cluster densities. Our numerical experiments show that Jaccard and Otsuka-Ochiai offer very accurate measures of density, when averaged over vertex pairs within clusters.
The paper is devoted to game-theoretic methods for community detection in networks. The traditional methods for detecting community structure are based on selecting dense subgraphs inside the network. Here we propose to use the methods of cooperative game theory that highlight not only the link density but also the mechanisms of cluster formation. Specifically, we suggest two approaches from cooperative game theory: the first approach is based on the Myerson value, whereas the second approach is based on hedonic games. Both approaches allow to detect clusters with various resolutions. However, the tuning of the resolution parameter in the hedonic games approach is particularly intuitive. Furthermore, the modularity-based approach and its generalizations as well as ratio cut and normalized cut methods can be viewed as particular cases of the hedonic games. Finally, for approaches based on potential hedonic games we suggest a very efficient computational scheme using Gibbs sampling.
The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Existing approaches require the number of communities pre-specified. We apply the so-called data recovery approach to allow a relaxation of the criterion for finding communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.
Changes in patterns of collaboration between Russian universities after the commencement of the Russian university excellence initiative (Project 5-100) are studied in this paper. While this project aimed to make leading Russian universities more globally competitive and improve their research productivity, it also happened to increase their cooperation. An analysis of affiliations and the co-authorship networks was conducted to explore scientific collaborations between and within the participating universities. Such analysis facilitates the investigation of the number of collaborations with other organizations, both domestic and international cooperation, and disciplinary differences. By analyzing the co-authorship networks, the position of universities in the academic network and the structure of collaborations among the participants were examined. A sample of 30 Russian universities, including participants in Project 5-100 and a control group of institutions with similar characteristics, was used. After joining the project, the participating universities increased both their cooperation with each other as well as with foreign universities and research institutions of the Russian Academy of Sciences, especially in the high-quality segment. At the same time, the collaboration patterns of non-participating universities did not change significantly. The centrality of Project 5-100 universities in the global academic network has increased, along with their visibility and coupling in the national network. The historical division between university and academic sectors has diminished, while the participating universities have started to play a more important role in knowledge production within the country.
A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.
Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability
Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.