Article
Optimal Two-Sided Embeddings of Complete Binary Trees in Rectangular Grids
The article considers the construction of optimal-area homeomorphic embeddings of complete binary trees in rectangular grids such that the leaf images are on the opposite sides of the grid and the edge images intersect only at node images. The minimum grid area that admits the embedding of a complete binary tree of depth n is shown to be asymptotically equal to n/3 2^n. An algorithm to construct an asymptotically optimal embedding of such a tree is proposed; the complexity of the algorithm is O(n2^n) bit operations.
In recent works on learning representations for graph structures, methods have been proposed both for the representation of nodes and edges for large graphs, and for representation of graphs as a whole. This paper considers the popular graph2vec approach, which shows quite good results for ordinary graphs. In the field of natural language processing, however, a graph structure called a dependency tree is often used to express the connections between words in a sentence. We show that the graph2vec approach applied to dependency trees is unsatisfactory, which is due to the WL Kernel. In this paper, an adaptation of this kernel for dependency trees has been proposed, as well as 3 other types of kernels that take into account the specific features of dependency trees. This new vector representation can be used in NLP tasks where it is important to model syntax (e.g. authorship attribution, intention labeling, targeted sentiment analysis etc.). Universal Dependencies treebanks were clustered to show the consistency and validity of the proposed tree representation methods.
In this paper we address the problem of finding the most probable state of discrete Markov random field (MRF) with associative pairwise terms. Although of practical importance, this problem is known to be NP-hard in general. We propose a new type of MRF decomposition, submodular decomposition (SMD). Unlike existing decomposition approaches SMD decomposes the initial problem into sub-problems corresponding to a specific class label while preserving the graph structure of each subproblem. Such decomposition enables us to take into account several types of global constraints in an efficient manner. We study theoretical properties of the proposed approach and demonstrate its applicability on a number of problems.
This book presents tools and methods for large-scale and distributed optimization. Since many methods in "Big Data" fields rely on solving large-scale optimization problems, often in distributed fashion, this topic has over the last decade emerged to become very important. As well as specific coverage of this active research field, the book serves as a powerful source of information for practitioners as well as theoreticians. Large-Scale and Distributed Optimization is a unique combination of contributions from leading experts in the field, who were speakers at the LCCC Focus Period on Large-Scale and Distributed Optimization, held in Lund, 14th–16th June 2017. A source of information and innovative ideas for current and future research, this book will appeal to researchers, academics, and students who are interested in large-scale optimization.
In 2001, Erwin introduced broadcast domination in graphs. It is a variant of classical domination where selected vertices may have different domination powers. The minimum cost of a dominating broadcast in a graph G is denoted γb(G). The dual of this problem is called multipacking: a multipacking is a set M of vertices such that for any vertex v and any positive integer r, the ball of radius r around v contains at most r vertices of M . The maximum size of a multipacking in a graph G is denoted mp(G). Naturally mp(G) ≤γb(G). Earlier results by Farber and by Lubiw show that broadcast and multipacking numbers are equal for strongly chordal graphs. In this paper, we show that all large grids (height at least 4 and width at least 7), which are far from being chordal, have their broadcast and multipacking numbers equal.
Dealing with relational data always required significant computational resources, domain expertise and task-dependent feature engineering in order to incorporate structural information into predictive model. Nowadays, a family of automated graph feature engineering techniques have been proposed in different streams of literature. So-called graph embeddings provide a powerful tool to construct vectorized feature spaces for graphs and their components, such as nodes, edges and subgraphs under preserving inner graph properties. Using the constructed feature spaces, many machine learning problems on graphs can be solved via standard frameworks suitable for vectorized feature representation.
Our survey aims to describe the core concepts of graph embeddings, and provide several taxonomies for their description. First, we start with methodological approach, and extract three types of graph embedding models based on matrix factorization, random-walks and deep learning approaches. Next, we describe how different types of networks impact the ability to of models to incorporate structural and attributed data into a unified embedding. Going further, we perform a thorough evaluation of graph embedding applications to machine learning problems on graphs, among which are node classification, link prediction, clustering, visualization, compression, and a family of the whole graph embedding algorithms suitable for graph classification, similarity and alignment problems. Finally, we overview the existing applications of graph embeddings to computer science domains, formulate open problems and provide experiment results, explaining how different embedding and graph properties are connected to the four classic machine learning problems on graphs, such as node classification, link prediction, clustering and graph visualization.
As a result, our survey covers a new rapidly growing field of network feature engineering, presents an in-depth analysis of models based on network types, and overviews a wide range of applications to machine learning problems on graphs.
In this paper, we study network feature engineering for the problem of future co-author recommendation, also called collaborator recommender system. We present a system, which uses authors' research interests and existing collaboration information to predict missing and most probable in the future links in the co-authorship network. The recommender system is stated as a link prediction problem for the current network and for new edges that appear next year. From machine learning point of view, both problems are treated as binary classification. We evaluate our research on our University researchers co-authorship network, while also mentioning results on sub-network of publications indexed in Scopus. Our approach has high accuracy and provides scalable solution for any significantly large co-authorship network.
In this paper, we study the problem of learning graph embeddings for dynamic networks and the ability to generalize to unseen nodes called inductive learning. Firstly, we overview the state-of-the-art methods and techniques for constructing graph embeddings and learning algorithms for both transductive and inductive approaches. Secondly, we propose an improved GSM based on GraphSAGE algorithm and set up the experiments on datasets CORA, Reddit, and HSEcite, which is collected from Scopus citation database across the authors with affiliation to NRU HSE in 2011–2017. The results show that our three-layer model with attention-based aggregation function, added normalization layers, regularization (dropout) outperforms suggested by the respective authors’ GraphSAGE models with mean, LSTM, and pool aggregation functions, thus giving more insight into possible ways to improve inducting learning model based on GraphSAGE model.