Метод кластеризации слов с использованием информации об их синтаксической связности
The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen as an extension of roles assigned to the patterns of verbs and nominalizations. We discuss the general principles of how the inventory of adjectival roles should be organized in comparison with the existing inventories of verbal roles. In order to verify our statements, we carry out an experimental survey aimed at measuring the similarity between adjectival and verbal roles. The results have shown that both semantic interpretation of roles and their typical morpho-syntactic expression are significant for the evaluation and should be taken into account in working out the inventory. Besides, the specificity of adjectives lies in their prototypical stative semantics, which favors some differences in assigning a semantic role as compared to verbs. The results of the survey also provide some evidence for verification and development the inventory of verbal semantic roles.
In this paper a social network is extracted from a literary text. The social network shows, how frequent the characters interact and how similar their social behavior is. Two types of similarity measures are used: the first applies co-occurrence statistics, while the second exploits cosine similarity on different types of word embedding vectors. The results are evaluated by a paid micro-task crowdsourcing survey. The experiments suggest that specific types of word embed- dings like word2vec are well-suited for the task at hand and the specific circumstances of literary fiction text.
In natural language processing, distributional semantic models are known as an efficient data driven approach to word and text representation, which allows computing meaning directly from large text corpora into word embeddings in a vector space. This paper addresses the role of linguistic preprocessing in enhancing performance of distributional models, and particularly studies pronominal anaphora resolution as a way to exploit more co-occurrence data without directly increasing the size of the training corpus. We replace three different types of anaphoric pronouns with their antecedents in the training corpus and evaluate the extent to which this affects the performance of the resulting models in lexical similarity tasks. CBOW and SkipGram distributed models trained on Russian National Corpus are in the focus of our research, although the results are potentially applicable to other distributional semantic frameworks and languages as well. The trained models are evaluated against RUSSE '15 and SimLex-999 gold standard data sets. As a result, we find that models trained on corpora with pronominal anaphora resolved perform significantly better than their counterparts trained on baseline corpora.
The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated in the study (mean age = 44.96 years, age range: 18–85 years, 77.9% — female). Data was collected via a purpose-built application, which served two purposes: administer the survey and download consenting user's public wall posts, gender and age from the Facebook profile. The survey included questions on engagement in harmful online behaviors and the Short Dark Triad scale; 15,281 wall posts from 1972 users were included in the dataset. These posts were subjected to morphological, lexical and semantic analyses. More than 25% of the sample reported engaging in harmful online behaviors. Males were more likely to send insulting or threatening messages and post aggressive comments; no gender differences were found for disseminating other people's private information. Psychopathy and male gender were the unique predictors of engagement in harmful online behaviors. A number of significant correlations were found between the dark traits and numeric, lexical, morphological and semantic characteristics of the participants' posts.
A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.
Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability
The geographic information system (GIS) is based on the first and only Russian Imperial Census of 1897 and the First All-Union Census of the Soviet Union of 1926. The GIS features vector data (shapefiles) of allprovinces of the two states. For the 1897 census, there is information about linguistic, religious, and social estate groups. The part based on the 1926 census features nationality. Both shapefiles include information on gender, rural and urban population. The GIS allows for producing any necessary maps for individual studies of the period which require the administrative boundaries and demographic information.
Existing approaches suggest that IT strategy should be a reflection of business strategy. However, actually organisations do not often follow business strategy even if it is formally declared. In these conditions, IT strategy can be viewed not as a plan, but as an organisational shared view on the role of information systems. This approach generally reflects only a top-down perspective of IT strategy. So, it can be supplemented by a strategic behaviour pattern (i.e., more or less standard response to a changes that is formed as result of previous experience) to implement bottom-up approach. Two components that can help to establish effective reaction regarding new initiatives in IT are proposed here: model of IT-related decision making, and efficiency measurement metric to estimate maturity of business processes and appropriate IT. Usage of proposed tools is demonstrated in practical cases.
I give the explicit formula for the (set-theoretical) system of Resultants of m+1 homogeneous polynomials in n+1 variables