Defining discourse formulae: computational approach

Gerasimenko Ekaterina; Puzhaeva Svetlana; Zakharova Elena; Rakhilina Ekaterina

doi:10.29007/k5q2

Publications

?

Defining discourse formulae: computational approach

P. 61–69.

Gerasimenko Ekaterina, Puzhaeva Svetlana, Zakharova Elena, Rakhilina Ekaterina

In this paper, we address the problem of automatic extraction of discourse formulae. By discourse formulae (DF) we mean a special type of constructions at the discourse level, which have a fixed form and serve as a typical response in the dialogue. Unlike traditional constructions [4, 5, 6], they do not contain variables within the sequence; their slots can be found in the left-hand or right-hand statements of the speech act. We have developed the system that extracts DF from drama texts. We have compared token-based and clause- based approaches and found the latter performing better. The clause-based model involves a uniform weight vote of four classifiers and currently shows the precision of 0.30 and the recall of 0.73 (F1-score 0.42).The created module was used to extract a list of DF from 420 drama texts of XIX-XXI centuries [1, 7]. The final list contains 3000 DF, 1800 of which are unique. Further development of the project includes enhancing the module by extracting left context features and applying other models, as well as exploring what DF concept looks like in other languages

Keywords: natural language processing machine learning Construction Grammar discourse formulae entity extraction

In book

Proceedings of Third Workshop "Computational linguistics and language science"

Wohlgenannt G., von Waldenfels R., Toldova S., Rakhilina E. V., Lyashevskaya O., Loukachevitch N. V., Artemova E. Issue 4. , Manchester: EasyChair, 2019.

Method of Automated Dataset Collection for Microwave Filters Synthesis

Arinin O. V., Bakhmach D. M., Katsnelson A. et al., , in: 2025 Systems of Signals Generating and Processing in the Field of on Board Communications.: IEEE, 2025. P. 1–5.

This research discusses the method of dataset collection automatization for microwave filter synthesis by integrating machine learning techniques, thus reducing development time. Utilizing the 3D electromagnetic analysis software package, the study involves simulation and collecting geometric parameters and amplitude-frequency characteristics from three variants of passband highly selective microstrip tworesonator combined filters with stepped impedance resonators. ...

Added: December 6, 2025

ОТСЛЕЖИВАНИЕ РАЗВИТИЯ РАЗРУШЕНИЯ С ПОМОЩЬЮ КЛАСТЕРИЗАЦИИ ИМПУЛЬСОВ ТЕРМИЧЕСКИ СТИМУЛИРОВАННОЙ АКУСТИЧЕСКОЙ ЭМИССИИ ПРИ ОТСУТСТВИИ ЛОКАЦИИ

Индаков Г. С., Казначеев П. А., Майбук З. Я. et al., Геофизические исследования 2025 Т. 26 № 2 С. 99–124

The paper studies the clusterability of acoustic emission pulses during high-temperature heating of sandstone sample preliminarily subjected to mechanical loading. Mechanical loading was applied in uniaxial mode up to load close to destructive with appearance of signs of large cracks on the surface. After that, samples were subjected to thermal treatment up to 650 °C ...

Added: September 19, 2025

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Wien: Association for Computational Linguistics, 2025.

Added: August 26, 2025

Analysis of a Company Model in Conditions of Unstable Demand Using Reinforcement Learning Methods

Delev A., Semakov S., , in: 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD).: IEEE, 2025. P. 318–322.

Profit is one of the most important economic indicators of a company’s performance, and for every company it is necessary to allocate resources in such a way as to obtain the maximum possible profit. The profit maximization problem is usually a dynamic optimization problem. This article discusses an approach to solving the production expansion problem ...

Added: August 25, 2025

Куда ни кинь — всюду идиома: о семантической эволюции одного устойчивого выражения

Харламова Д. С., Reznikova T., Русская речь 2024 № 6 С. 34–51

The paper concerns the semantic evolution of the Russian expression "kuda ni kin’" on the material of the Russian National Corpus. On the synchronous level, this expression has two distinct meanings: visual (1) ‘wherever you look’ and mental (2) ‘whatever you think about’. Corpus data show that more abstract semantics (2) came into use fi rst. This might ...

Added: August 6, 2025

Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта

Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74

Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...

Added: June 5, 2025

Periods of high uncertainty: How fertility intentions in Russia changed during 2022–2023

Vakulenko E., Gorskiy D., Kondrateva V. et al., Demographic Research 2025 Vol. 52 P. 939–970

BACKGROUND We study fertility intentions change in Russia, during the period of socio-economic shocks in 2022-2023, in response to the Russia-Ukraine armed conflict. OBJECTIVE Our objective is to identify factors that influence decision-making in a low fertility context during the crisis, including both objective characteristics and subjective assessment of the current situation. METHODS This paper is based on unique survey ...

Added: May 6, 2025

Prospects for Big Text Data Application in Technology Maturity Assessment (Publications Review)

Loginova I., Grozovskiy F., Aksenova A., Automatic Documentation and Mathematical Linguistics 2025 Vol. 59 No. 3 P. 145–153

The paper analyzes the limitations of conventional methods for assessing the maturity of technology, such as the S-curve, technology readiness level (TRL), Gartner’s hype cycle and their dependence on experts’ opinions. Current approaches to this task based on big text data analysis and machine learning algorithms are reviewed, and their advantages are demonstrated. As a ...

Added: April 28, 2025

Application of Physics-Informed Neural Networks for Solving the Inverse Advection-Diffusion Problem to Localize Pollution Sources

Derkach D., Efremenko D., Чупров И. А. et al., / Series Computer Science "arxiv.org". 2025. No. 2503.18849.

Added: March 25, 2025

Generative models and seq2seq techniques for the flash-simulation of the LHCb experiment

Derkach D., Anderlini L., Capelli S. et al., Proceedings of Science 2025 Vol. 476 P. 1032

Simulating detector and reconstruction effects on physics quantities is crucial for data analysis, but it is coming unsustainably costly for the upcoming HEP experiments. The most radical approach to speed-up detector simulation is Flash Simulation, as proposed by the LHCb collaboration in Lamarr, a software package implementing a novel simulation paradigm relying on Deep Generative ...

Added: March 13, 2025

Real-bogus scores for active anomaly detection

Semenikhin T. A., Kornilov M., Pruzhinskaya M. et al., Astronomy and Computing 2025 Vol. 51 Article 100919

In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts — such as plane or satellite tracks, bad columns on CCDs, and ghosts — often constitute significant contaminants in results from anomaly detection analysis. ...

Added: March 3, 2025

Exploring the Universe with SNAD: Anomaly Detection in Astronomy

Volnova A., Aleo P., Lavrukhina A. et al., Communications in Computer and Information Science 2024 Vol. 2086 P. 195–208

SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the ...

Added: March 3, 2025

SNAD catalogue of M-dwarf flares from the Zwicky Transient Facility

Voloshina A., Lavrukhina A., Pruzhinskaya M. et al., Monthly Notices of the Royal Astronomical Society 2024 Vol. 533 No. 4 P. 4309–4323

Most of the stars in the Universe are M spectral class dwarfs, which are known to be the source of bright and frequent stellar flares. In this paper, we propose new approaches to discover M-dwarf flares in ground-based photometric surveys. We employ two approaches: a modification of a traditional method of parametric fit search and ...

Added: March 3, 2025

Сравнительный анализ моделей прогнозирования региональной инфляции

Габов М. А., Bukina T. V., Kashin D., Журнал Новой экономической ассоциации 2025 № 4(69) С. 87–117

The study aims to compare approaches to forecasting the monthly level of consumer price index (CPI y/y) in the regions of the Volga Federal District using time series models and machine learning methods. This study attempts to select the most appropriate and efficient models for predicting the regional general price level index. The paper also ...

Added: February 22, 2025

Performance Modeling of Data Storage Systems using Generative Models

Al-Maeeni A. R., Temirkhanov A., Ryzhikov A. et al., IEEE Access 2025 Vol. 13 P. 49643–49658

High-precision systems modeling is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. In this study, we developed several models of a storage system using machine learning-based generative models to predict performance metrics such as IOPS and latency. The models ...

Added: February 20, 2025

Kak že kak že! Russian discourse formula of confirmation as a marker of recognition

Ekaterina Rakhilina, Bychkova P., , in: Constructions with lexical repetitions in East Slavic.: De Gruyter Mouton, 2024. P. 197–222.

The chapter presents a case study of the repetition mechanism within the development of discourse formulae, i.e., multi-word formulaic replies similar to yes and no. It closely examines the process of pragmaticalization in the Russian formula Kak že! (‘how part’) and its duplicated counterpart. The diachronic corpus data shows that the formula Kak že! emerged ...

Added: February 13, 2025

Classification of Long Gamma-Ray Transients from INTEGRAL Data Using Machine Learning Approach

Mozgunov G., Pozanenko A. S., Minaev P. Y. et al., , in: Data Analytics and Management in Data Intensive Domains: 25th International Conference, DAMDID/RCDL 2023, Moscow, Russia, October 24–27, 2023, Revised Selected PapersVol. 2086: Communications in Computer and Information Science.: Springer, 2024. P. 215–224.

Added: January 31, 2025

Model for Assessing the Liquidity of a Stock Market Trading Instrument

Sizykh D., Tregub K., Belyakov B. et al., , in: 2024 17th International Conference on Management of Large-Scale System Development (MLSD).: IEEE, 2024. P. 1–5.

Currently, a large number of studies are being conducted to improve the accuracy of the developed forecasting methods for the stock market. At the same time, multivariate models based on machine learning methods are increasingly used. Since liquidity indicators have a significant impact on asset pricing, taking them into account can improve the accuracy of ...

Added: January 15, 2025

Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?

Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84

Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...

Added: January 7, 2025

Artificial Neural Networks as a Natural Tool in Solution of Variational Problems in Hydrodynamics

Litvinenko N., IEEE Access 2024

Added: December 9, 2024

Может ли искусственный интеллект прогнозировать решения суда? Систематический обзор международных исследований

Kazun A., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 5 С. 100–122

Advancements in artificial intelligence technologies and the emergence of open databases containing judicial decisions have led to rapid improvements in algorithms capable of classifying legal documents and forecasting decisions made by judges. This article examines a body of international research dedicated to the question of how accurately AI can predict judges’ decisions, and consequently, whether ...

Added: November 29, 2024