Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information Science
In recent works on learning representations for graph structures, methods have been proposed both for the representation of nodes and edges for large graphs, and for representation of graphs as a whole. This paper considers the popular graph2vec approach, which shows quite good results for ordinary graphs. In the field of natural language processing, however, a graph structure called a dependency tree is often used to express the connections between words in a sentence. We show that the graph2vec approach applied to dependency trees is unsatisfactory, which is due to the WL Kernel. In this paper, an adaptation of this kernel for dependency trees has been proposed, as well as 3 other types of kernels that take into account the specific features of dependency trees. This new vector representation can be used in NLP tasks where it is important to model syntax (e.g. authorship attribution, intention labeling, targeted sentiment analysis etc.). Universal Dependencies treebanks were clustered to show the consistency and validity of the proposed tree representation methods.
This paper is aimed at evaluating the performance of existing models of morphemic analysis for Russian based on convolutional neural networks. The models were trained on a relatively small amount of annotated training data (38,368 words). We tuned the hyperparameters to accommodate the harder task setting, which helped improve the accuracy of the model. In addition to testing 15 different configurations on the available test set, a new sample of 800 words containing roots that are missing in the training sample (e.g. neologisms and recent loan words) was manually created and annotated for morphemic structure (the new dataset is made available to the community). The effectiveness of the models was evaluated on this sample, and it turned out that the performance of the CNN models was much worse on this set (an almost 30% drop in word accuracy). We performed a classification of errors made by the best model both on the standard test set and the new one.
In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler feedforward networks. Classification accuracy was evaluated for different architectures, text representations, and sets of miscellaneous parameters.
The goal of our research is to investigate how the communication structure of an organization aﬀects its performance. In the paper, we study a simulation model of a self-organizing team conducting scientiﬁc research. The key parameter of the model is the social graph of the organization, which deﬁnes the team creation process. For this model, we formally deﬁne the average utilization rate of the group. Under some natural condition, the utilization rate is a function of the social graph. Lower and upper bounds of this characteristic are established. The obtained result has evident practical meaning and policy implications for organization management.
In this paper we provide the methodology for evaluating ef- fectiveness of international sanctions using Data Envelopment Analysis (DEA), which we use for generating the network matrix for further anal- ysis. DEA is a non-parametric technique used to compare performance of similar units, such as departments or organizations. DEA has wide applications in all industries, and has been successfully used to compare performance of hospitals, banks, universities, etc. The most important advantage of this technique is that it can handle multiple input and out- put variables, even those not generally comparable to each other. We use the ”Threat and Imposition of Sanctions (TIES)” Data 4.0 for analysis. This database contains the largest number of cases of international sanctions (1412 from the years 1945-2005) imposed by some countries on others, takes into account simultaneous sanction imposition, and also estimates the cost of all sanctions - both for those who receive and those who impose them. As input variables for DEA model we use the impact of sender commitment, anticipated target and sender economic costs, and actual target and sender economic costs. As the output variable, we use the outcome of sanctions for senders. We describe how to use DEA cross-efficiency outputs to build the network of sanction episodes. Our proposed combination of DEA and network methodology allows us to cluster sanction episodes depending on their outcomes, and provides explanations of higher efficiency of one group of sanction episodes over the others.
A large number of methods are being developed in the deep reinforcement learning area recently, but the scope of their application is limited. The number of environments does not always allow for a comprehensive assessment of a new agent training algorithm. The main purpose of this article is to present another environment for Match-3 game that could be expanded, which would have a connection with the real business. The results for the most popular deep reinforcement learning algorithms are presented as a baseline.
In this paper, we address several aspects of applying classical machine learning algorithms to a regression problem. We compare the predictive power to validate our approach on a data about revenue of a large Russian restaurant chain. We pay special attention to solve two problems: data heterogeneity and a high number of correlated features. We describe methods for considering heterogeneity — observations weighting and estimating models on subsamples. We define a weighting function via Mahalanobis distance in the space of features and show its predictive properties on following methods: ordinary least squares regression, elastic net, support vector regression, and random forest.
The paper discusses the problems of preventing harmful information spreading in social Networks. Social networks are widespread nowadays and are used not only for managers and marketers propagation of advertising, promotion of goods, but also by attackers to spread harmful information. Thus, there is a need to counter the attackers. This paper presents simulation tools and several features that contribute to the successful application for modeling social networks and examine different strategies preventing rumors and harmful information spreading. The authors cite an example of a simulation model for identifying intruders in a social network, software tools and the results of simulation experiments.