?
Generating Sport Summaries: A Case Study for Russian
P. 149-161.
We present a novel dataset of sports broadcasts with 8,781 games. The dataset contains 700 thousand comments and 93 thousand related news documents in Russian. We run an extensive series of experiments of modern extractive and abstractive approaches. The results demonstrate that BERT-based models show modest performance, reaching up to 0.26 ROUGE-1F-measure. In addition, human evaluation shows that neural approaches could generate feasible although inaccurate news basing on broadcast text.
Association for Computational Linguistics, 2019
The 4th Workshop on Representation Learning for NLP (RepL4NLP) will be hosted by ACL 2019 and held on 2 August 2019. The workshop is being organised by Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Alexis Conneau, Johannes Welbl, Xian Ren and Marek Rei; and advised by Kyunghyun Cho, Edward Grefenstette, Karl Moritz ...
Added: November 1, 2019
Diskin M., Bukhtiyarov A., Ryabinin M. et al., , in : Advances in Neural Information Processing Systems 34 (NeurIPS 2021). : Curran Associates, Inc., 2021. P. 7879-7897.
Added: November 24, 2021
Мальтина Л. П., Malafeev A., , in : Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018. : Aachen : CEUR Workshop Proceedings, 2018. Ch. 9. P. 85-94.
The paper considers the task of the morphemic analysis of Russian words and compares the efficiency of several proposed models. These models can be divided into three groups: derivational and inflectional rule-based, proba- bilistic, and hybrid models. The latter achieved state-of-the-art results of 0.848 F-score on a test set of 500 Russian words. The models ...
Added: February 15, 2019
Magge A., Tutubalina E., Miftahutdinov Z. et al., Journal of the American Medical Informatics Association : JAMIA 2021 Vol. 28 No. 10 P. 2184-2192
Objective
Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to ...
Added: October 1, 2021
Artemova E., Batura T., Golenkovskaya A. et al., , in : Digital Transformation and Global Society. DTGS 2020. Vol. 1242.: Springer, 2020. P. 208-222.
In this paper we present a corpus of Russian strategic planning documents, RuREBus. This project is grounded both from language technology and e-government perspectives. Not only new language sources and tools are being developed, but also their applications to e-government research.
We demonstrate the pipeline for creating a text corpus from scratch. First, the annotation schema ...
Added: May 10, 2021
Polyakov E. V., Polyakov S. V., Abramov P., , in : Proceedings of 2019 XVI International Symposium "Problems of Redundancy in Information and Control Systems" (REDUNDANCY). : IEEE, 2019. P. 159-164.
Determining the tonality of the text is a difficult task, the solution of which essentially depends on the context, the field of study and the amount of text data. The analysis shows that the authors in their works do not jointly use the full range of possible transformations on the data and their combinations. The ...
Added: September 20, 2020
Gozuacik N., Sakar C. O., Ozcan S., Expert Systems with Applications 2021 Vol. 183 No. 30 November 2021 P. 1-13
Social media platforms are considered one of the most effective intermediaries for companies to interact with consumers. Social media-based decision support systems for the marketing domain are highly developed, but product development and innovation-oriented studies remain limited. This study offers a novel approach which utilises opinion retrieval theme along with sentiment analysis to support the ...
Added: December 12, 2021
Cham : Springer, 2019
Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year.
The conference focus on areas of intelligent systems and artificial intelligence (AI) and ...
Added: August 29, 2018
Nikolaev K., Malafeev A., , in : Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018. : Springer, 2018. Ch. 12. P. 121-126.
This paper deals with automatic classification of questions in the Russian language. In contrast to previously used methods, we introduce a convolutional neural network for question classification. We took advantage of an existing corpus of 2008 questions, manually annotated in accordance with a pragmatic 14-class typology. We modified the data by reducing the typology to ...
Added: February 15, 2019
Рыбаков В. В., Malafeev A., , in : Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018. : Aachen : CEUR Workshop Proceedings, 2018. Ch. 8. P. 75-84.
The paper presents an attempt to solve the task of aspect-based sentiment analysis in the domain of Russian-language hotel reviews, using distributed representation of words. The authors follow an approach similar to [Blinov, Kotelnikov, 2014], but applied to a different domain and using different parameters. The authors also present a new dataset that is made ...
Added: February 15, 2019
Сучков Е. П., Алексеенко Г. О., Налчаджи К. В., Интеллектуальные системы. Теория и приложения 2022 Т. 26 № 1 С. 250-254
Currently, video surveillance systems are becoming more
widespread. One of the main goals of such systems is to control and
track a person’s movement. The solution of this problem allows us
to solve such applied problems as tracking the occupancy of various
premises (whether shopping facilities or educational and cultural
institutions), creating a motion heatmap or organizing control of access
to ...
Added: January 31, 2023
Grigoryev T., Verezemskaya P., Krinitskiy M. et al., Remote Sensing 2022 Vol. 14 No. 22 Article 5837
Global warming has made the Arctic increasingly available for marine operations and created a demand for reliable operational sea ice forecasts to increase safety. Because ocean-ice numerical models are highly computationally intensive, relatively lightweight ML-based methods may be more efficient for sea ice forecasting. Many studies have exploited different deep learning models alongside classical approaches ...
Added: June 19, 2023
Cham : Springer, 2022
This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during December 16-18, 2021. The world of Data Science changes every year. At AIST, we exchange our understanding of the Science state-of-the-art, as well as how it applies to life and business. AIST ...
Added: January 4, 2022
Ilia Karpov, Nick Kartashev, , in : Analysis of Images, Social Networks and Texts. 10th International Conference, AIST 2021, Tbilisi, Georgia, December 16–18, 2021, Revised Selected Papers. : Cham : Springer, 2022. P. 1-10.
The ubiquity of the contemporary language understanding tasks gives relevance to the development of generalized, yet highly efficient models that utilize all knowledge, provided by the data source. In this work, we present SocialBERT - the first model that uses knowledge about the author’s position in the network during text analysis. We investigate possible models ...
Added: October 31, 2021
Tutubalina E., Алимова И. С., Мифтахутдинов З. et al., Bioinformatics 2021 Vol. 37 No. 2 P. 243-249
Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient’s health conditions and adverse drug reactions reported on the ...
Added: January 13, 2021
Association for Computational Linguistics, 2021
Natural Language Processing (NLP) has benefited from promising recent advances including the employment of latest deep learning technology amongst a host of other solutions. The current pandemic has prevented the in-person exchange of ideas and networking of NLP researchers and students, but virtual communication opportunities have enabled continued collaboration and provided alternative communication channels. While ...
Added: September 27, 2021
Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5-21
The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...
Added: December 11, 2019
Rukhovich D., Koroleva P., Rukhovich D. et al., Remote Sensing 2022 Vol. 14 No. 9 Article 2224
The detection of degraded soil distribution areas is an urgent task. It is difficult and very time consuming to solve this problem using ground methods. The modeling of degradation processes based on digital elevation models makes it possible to construct maps of potential degradation, which may differ from the actual spatial distribution of degradation. The ...
Added: November 14, 2022
Golovanov S., Rauf Kurbanov, Sergey Nikolenko et al., , in : Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. : Association for Computational Linguistics, 2019. P. 6053-6058.
Large-scale pretrained language models define state of the art in natural language processing, achieving outstanding performance on a variety of tasks. We study how these architectures can be applied and adapted for natural language generation, comparing a number of architectural and training schemes. We focus in particular on open-domain dialog as a typical high entropy ...
Added: February 20, 2021
Switzerland : Springer, 2019
This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016.
The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...
Added: February 8, 2020
Golovanov S., Tselousov A., Rauf Kurbanov et al., , in : The NeurIPS '18 Competition: From Machine Learning to Intelligent Conversations. : Springer, 2020. P. 295-315.
Added: February 20, 2021
Artemova E., , in : The Palgrave Handbook of Digital Russia Studies. : Palgrave Macmillan, 2021. Ch. 26. P. 465-481.
Deep learning is a term used to describe artificial intelligence (AI) technologies. AI deals with how computers can be used to solve complex problems in the same way that humans do. Such technologies as computer vision (CV) and natural language processing (NLP) are distinguished as the largest AI areas. To imitate human vision and the ...
Added: December 20, 2020
Polyakov E. V., Voskov L., Abramov P. et al., Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 2020 No. 1 P. 2-14
Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study and amount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformations and their combinations. Only a part of the transformations is used, limiting the ways to ...
Added: February 20, 2020
Alimova l., Tutubalina E., Journal of Biomedical Informatics 2020 Vol. 103 P. 1-9
Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding ...
Added: October 28, 2020