Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers
This book constitutes the proceedings of the Fourth International Conference on Analysis of Images, Social Networks and Texts, AIST 2015, held in Yekaterinburg, Russia, in April 2015. The 24 full and 8 short papers were carefully reviewed and selected from 140 submissions. The papers are organized in topical sections on analysis of images and videos; pattern recognition and machine learning; social network analysis; text mining and natural language processing.
We propose a probabilistic model for learning continuous vector representations of nodes in directed networks. These representations could be used as high quality features describing nodes in a graph and implicitly encoding global network structure. The usefulness of the representations is demonstrated on link prediction and graph visualization tasks. Using representations learned by our method allows to obtain results comparable to state of the art methods on link prediction while requires much less computational resources. We develop an efficient online learning algorithm which makes it possible to learn representations from large and non-stationary graphs. It takes less than a day on a commodity computer to learn high quality vectors on LiveJournal friendship graph consisting of 4.8 million nodes and 68 million links and the reasonable quality of representations can be obtained much faster.
We present an improved implementation of the Annotated suffix tree method for text analysis (abbreviated as the AST-method). Annotated suffix trees are an extension of the original suffix tree data structure, with nodes labeled by occurrence frequencies for corresponding substrings in the input text collection. They have a range of interesting applications in text analysis, such as language-independent computation of a matching score for a keyphrase against some text collection. In our enhanced implementation, new algorithms and data structures (suffix arrays used instead of the traditional but heavyweight suffix trees) have enabled us to derive an implementation superior to the previous ones in terms of both memory consumption (10 times less memory) and runtime. We describe an open-source statistical text analysis software package, called ''EAST'', which implements this enhanced annotated suffix tree method. Besides, the EAST package includes an adaptation of a distributional synonym extraction algorithm that supports the Russian language and allows us to achieve better results in keyphrase matching.
In this paper we show how several similarity measures can be combined for finding similarity between a pair of users for performing Collaborative Filtering in Recommender Systems. Through aggregation of several measures we find super similar and super dissimilar user pairs and assign a different similarity value for these types of pairs. We also introduce another type of similarity relationship which we call medium similar user pairs and use traditional JMSD for assigning similarity values for them. By experimentation with real data we show that our method for finding similarity by aggregation performs better than each of the similarity metrics. Moreover, as we apply all the traditional metrics in the same setting, we can assess their relative performance
In this paper we explore an application of the pyramid HOG (Histograms of Oriented Gradients) features in image recognition problem with small samples. A sequential analysis is used to improve the performance of hierarchical methods. We propose to process the next, more detailed level of pyramid only if the decision at the current level is unreliable. The Chow’s reject option of comparison of the posterior probability with a fixed threshold is used to verify recognition reliability. The posterior probability is estimated for the homogeneity-testing probabilistic neural network classifier on the basis of its relation with the Bayesian decision. Experimental results in face recognition are presented. It is shown that the proposed approach allows to increase the recognition performance in 2–4 times in comparison with conventional classification of pyramid HOGs.
In online social networks, high level features of user behavior such as character traits can be predicted with data from user profiles and their connections. Recent publications use data from online social networks to detect people with depression propensity and diagnosis. In this study, we investigate the capabilities of previously published methods and metrics applied to the Russian online social network VKontakte. We gathered user profile data from most popular communities about suicide and depression on VK.com and performed comparative analysis between them and randomly sampled users. We have used not only standard user attributes like age, gender, or number of friends but also structural properties of their egocentric networks, with results similar to the study of suicide propensity in the Japanese social network Mixi.com. Our goal is to test the approach and models in this new setting and propose enhancements to the research design and analysis. We investigate the resulting classifiers to identify profile features that can indicate depression propensity of the users in order to provide tools for early depression detection. Finally, we discuss further work that might improve our analysis and transfer the results to practical applications.
Russian FrameBank is a bank of annotated samples from the Russian National Corpus which documents the use of lexical constructions (e.g. argument constructions of verbs and nouns). FrameBank belongs to FrameNet-oriented resources, but unlike Berkeley FrameNet it focuses more on the morphosyntactic and semantic features of individual lexemes rather than the generalized frames, following the theoretical approaches of Construction Grammar (Ch. Fillmore, A. Goldberg, etc.) and of Moscow Semantic School (Ju. D. Apresjan, E. V. Paducheva, etc.).
The mechanisms of real-world social network formation and evolution are one of the most important topics in the field of network science. In this study we collect data about the development of the Vkontakte (a popular Russian social networking site) network of first-year students at a Russian university. We analyze the network formation process from the moment of network establishing until its stabilization. Using Conditional Uniform Graph Test, we compare the graph-level indices of the observed network with random same-size networks that were generated according to random, preferential attachment, and small-world algorithms. We propose two explanatory mechanisms of online network growth: the connected component attachment mechanism and the brokerage mechanism.
The paper describes a strategy that applies heuristics to combine sets of terminological words and words combination pre-extracted from a scientific text by several term recognition procedures. Each procedure is based on a collection of lexico-syntactic patterns representing specific linguistic information about terms within scientific texts. Our strategy is aimed to improve the quality of automatic term extraction from a particular scientific text. The experiments have shown that the strategy gives 11-17% increase of F-measure compared with the commonly-used methods of term extraction.