?
So What’s the Plan? Mining Strategic Planning Documents
P. 208-222.
Artemova E., Batura T., Golenkovskaya A., Ivanin V., Ivanov V., Sarkisyan V., Smurov I., Tutubalina E.
In this paper we present a corpus of Russian strategic planning documents, RuREBus. This project is grounded both from language technology and e-government perspectives. Not only new language sources and tools are being developed, but also their applications to e-government research.
We demonstrate the pipeline for creating a text corpus from scratch. First, the annotation schema is designed. Next texts are marked up using human-in-the-loop strategy, so that preliminary annotations are derived from a machine learning model and are manually corrected.
The amount of annotated texts is large enough to showcase what insights can be gained from RuREBus.
Association for Computational Linguistics, 2021
Natural Language Processing (NLP) has benefited from promising recent advances including the employment of latest deep learning technology amongst a host of other solutions. The current pandemic has prevented the in-person exchange of ideas and networking of NLP researchers and students, but virtual communication opportunities have enabled continued collaboration and provided alternative communication channels. While ...
Added: September 27, 2021
Malykh V., Porplenko D., Tutubalina E., , in : Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected Papers. Vol. 12602.: Springer, 2021. P. 149-161.
We present a novel dataset of sports broadcasts with 8,781 games. The dataset contains 700 thousand comments and 93 thousand related news documents in Russian. We run an extensive series of experiments of modern extractive and abstractive approaches. The results demonstrate that BERT-based models show modest performance, reaching up to 0.26 ROUGE-1F-measure. In addition, human evaluation ...
Added: May 10, 2021
Association for Computational Linguistics, 2019
The 4th Workshop on Representation Learning for NLP (RepL4NLP) will be hosted by ACL 2019 and held on 2 August 2019. The workshop is being organised by Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Alexis Conneau, Johannes Welbl, Xian Ren and Marek Rei; and advised by Kyunghyun Cho, Edward Grefenstette, Karl Moritz ...
Added: November 1, 2019
Мальтина Л. П., Malafeev A., , in : Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018. : Aachen : CEUR Workshop Proceedings, 2018. Ch. 9. P. 85-94.
The paper considers the task of the morphemic analysis of Russian words and compares the efficiency of several proposed models. These models can be divided into three groups: derivational and inflectional rule-based, proba- bilistic, and hybrid models. The latter achieved state-of-the-art results of 0.848 F-score on a test set of 500 Russian words. The models ...
Added: February 15, 2019
Рыбаков В. В., Malafeev A., , in : Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018. : Aachen : CEUR Workshop Proceedings, 2018. Ch. 8. P. 75-84.
The paper presents an attempt to solve the task of aspect-based sentiment analysis in the domain of Russian-language hotel reviews, using distributed representation of words. The authors follow an approach similar to [Blinov, Kotelnikov, 2014], but applied to a different domain and using different parameters. The authors also present a new dataset that is made ...
Added: February 15, 2019
Magge A., Tutubalina E., Miftahutdinov Z. et al., Journal of the American Medical Informatics Association : JAMIA 2021 Vol. 28 No. 10 P. 2184-2192
Objective
Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to ...
Added: October 1, 2021
Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5-21
The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...
Added: December 11, 2019
Springer, 2021
This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021.
The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...
Added: October 28, 2021
Razzhigaev A., Nikolay Arefyev, Panchenko A., , in : Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). : Association for Computational Linguistics, 2021. P. 157-162.
In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple ...
Added: September 23, 2021
Северина Е. М., Ларионова М. Ч., Litera 2023 № 10 С. 211-222
The article considers a model of preparation of machine-readable (semantic) markup of texts for the Chekhov Digital project on the example of philological interpretation of individual significant elements of A. P. Chekhov's story "Death of an Official" and presentation of this information explicitly based on the standards of digital publication Text Encoding Initiative (TEI/XML). Based ...
Added: January 12, 2024
Luparov A., Panov A. I., Suvorov R. et al., , in : Proceedings of ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and Methods. Vol. 2.: SciTePress, 2015. P. 270-276.
Dendritic cells (DCs) vaccination is a promising way to contend cancer metastases especially in the case of immunogenic tumors. Unfortunately, it is only rarely possible to achieve a satisfactory clinical outcome in the majority of patients treated with a particular DC vaccine. Apparently, DC vaccination can be successful with certain combinations of features of the ...
Added: November 20, 2015
Ivan P. Yamshchikov, Shibaev V., Nagaev A. et al., , in : Proceedings of the 3rd Workshop on Neural Generation and Translation. : Association for Computational Linguistics, 2019. P. 128-137.
This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms ...
Added: January 7, 2021
Switzerland : Springer, 2015
This book constitutes the refereed proceedings of the 6th Conference on Knowledge Engineering and the Semantic Web, KESW 2015, held in Moscow, Russia, in September/October 2015. The 17 revised full papers presented together with 6 short system descriptions were carefully reviewed and selected from 35 submissions. The papers address research issues related to semantic web, ...
Added: September 16, 2015
I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176-183
This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...
Added: November 4, 2023
Kirina M., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2022 Т. 20 № 2 С. 93-109
В статье описываются результаты тематического моделирования малой художественной прозы на основе трех методов – латентного размещения Дирихле (LDA), структурного тематического моделирования (STM) и неотрицательной матричной факторизации (NMF) – в сочетании с разными вариантами предобработки текстов. Апробация экспериментального дизайна осуществляется на материале Корпуса русского рассказа 1900–1930 гг. Исследование позволило выявить особенности рассматриваемых алгоритмов и оценить эффективность ...
Added: December 10, 2022
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Baklanova V., Kurkin A., Teplova T., China Finance Review International 2023
Purpose – The primary objective of this research is to provide a precise interpretation of the constructed
machine learning model and produce definitive summaries that can evaluate the influence of investor sentiment on the overall sales of non-fungible token (NFT) assets. To achieve this objective, the NFT hype
index was constructed as well as several approaches of ...
Added: December 10, 2023
Karpov N., В кн. : Современные проблемы информатизации в анализе и синтезе технологических и программно-телекоммуникационных систем: Сборник трудов. Вып. 17.: Воронеж : Научная книга, 2012. С. 264-266.
Added: November 7, 2012
Davletov A., Gordeev D., Nikolay Arefyev et al., , in : Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). : Association for Computational Linguistics, 2021. P. 1249-1254.
This work describes our approach for subtasks of SemEval-2021 Task 8: MeasEval: Counts and Measurements which took the official first place in the competition. To solve all subtasks we use multi-task learning in a question-answering-like manner. We also use learnable scalar weights to weight subtasks’ contribution to the final loss in multi-task training. We fine-tune ...
Added: September 23, 2021
Фирсанова В. И., International Journal of Open Information Technologies 2021 Vol. 9 No. 12 P. 53-59
The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is based on the data from several natural language processing experiments conducted with a question answering dataset for inclusion of people with autism spectrum disorder and state-of-the-art ...
Added: September 25, 2023
Berlin : Springer, 2014
This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...
Added: November 13, 2014
Alimova l., Tutubalina E., Journal of Biomedical Informatics 2020 Vol. 103 P. 1-9
Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding ...
Added: October 28, 2020
Karpov I., Крылова Т. В., Timoshenko S., Scando-Slavica 2022 P. 1-20
In this paper we describe the difference between informal comments, posted on social networks, and internet journalistic style texts, which tend to be written in a Codified Literary Russian. We performed a quantitative analysis of more than0 graphic, morphological, syntactic features, and supplied statistically significant features with the linguistic interpretation. The article concluded that the ...
Added: October 31, 2021
Zakhlebin I. V., В кн. : Электронный бизнес. Управление интернет-проектами. Инновации: Сборник трудов участников студенческой научно-практической конференции, Москва, 12-14 марта 2013 г. : М. : НИУ ВШЭ, 2014. С. 88-91.
The report deals with the methodology of building a system to perform search for specialists satisfying a defined set of competencies. The proposed search method is based on natural language texts analysis. ...
Added: July 11, 2015