Digital Transformation and Global Society. Fourth International Conference, DTGS 2019, St. Petersburg, Russia, June 19–21, 2019, Revised Selected Papers
This paper investigates on how spectators communication is organized in chats during broadcasts on Twitch.tv with the main focus on toxic communication. The main purpose of the paper is to understand how socio-demographic characteristics of a broadcaster and channel settings which broadcaster can control affect communication in a chat. Chat logs from Twitch.tv channels were used to create a topic model of viewers discussions. The result of regression analysis indicates that socio-demographic characteristics of a broadcaster have a statistically significant effect on the type of communication, which is manifested in chat.
Numerous cultural events take place around the world every year. Visitors leave digital footprint after attending such events, which is a good source of data analysis in tourist behavior and cultural studies. This research provides mapping of festival themes associated with the annual cultural event “Museum Night” on social networking site (SNS) VKontakte (VK) most popular in Russia. All posts containing the official event hashtag in Russian (#ночьмузеев) were collected from VK. To analyse the data, more than 38k posts spanning 2012 to 2019 are used. The results show the dynamic of the event web activity and changes over the last years.
In this paper, we present a project on the analysis of an extensive corpus of strategic planning documents, devoted to various aspects of the development of Russian regions. The main purposes of the project are: 1) to extract different aspects of goal setting and planning, 2) to form an ontology of goals and criteria of achieving these goals, 3) to measure the similarity between goals declared by federal and municipal subjects.
Such unsupervised Natural Language Processing (NLP) methods as phrase chunking, word embeddings, and latent topic modeling are used for information extraction and ontology construction as well as similarity computation. The resulting ontology should serve in short-term as a helper tool for writing strategic planning documents and in long-term resolve the need to compose strategic planning documents completely by navigating through the ontology and selecting relevant goals and criteria. The resulting similarity measure between federal and municipal goals will serve as a navigation tool for further analysis.
Digital technologies provide new possibilities for studying cultural heritage. Thus, literature research involving large text corpora allows to set and solve theoretical problems which previously had no prospects for their decision. For example, it has become possible to model the literary system for some defi-nite literary period (i.e., for the Silver Age of Russian literature) and to classify prose writers according to their stylistic features. And more than that, it allows to solve more general theoretical problems. The given research was conducted on Russian literary texts of the early 20th century. The sample included 100 short stories by 100 different writers. The measurements were carried out for 5 syntactic variables. For each of these distributions, the most popular statistics were calculated. Basing on these data, we consider empirical verification of Lyapunov's central limit theorem (CLT). The article validates the effectiveness of CLT theorem and the conditions for its implementation. Besides the normal (Gaussian) function we used another analytical model — the Hausstein func-tion. It turned out that both theoretical distributions for each of five variables do not contradict the experimental data. However, the alternative analytical model (Hausstein function) has shown even better agreement with the experimental data. The obtained results may be used in computational linguistic studies and for research of Russian literary heritage.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as a state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we suggest a method to compile the gold standard datasets for the Russian poetry. Secondly, we focus on the taggers’ performance in the identification of the part of speech tags and lemmas. We reveal what kind of POS classes, paradigm classes and syntactic patterns mostly affect the quality of processing.