Система автоматического аннотирования текстов с помощью стохастической модели

Т. В. Вознесенская; Леднов Д. А.

doi:10.21469/22233792.4.4.04

Publications

?

Система автоматического аннотирования текстов с помощью стохастической модели

Машинное обучение и анализ данных. 2018. Т. 4. № 4. С. 266–279.

Voznesenskaya T., Леднов Д. А.

This paper is toward the system of automatic text summarization developed by «DC – Systems» company in cooperation with the faculty of computer science at HSE. The summary is a concise description of the text in terms of its content and meaning, i.e. from the point of view of its semantics. The purpose of the summarization is to reduce the text as much as possible while maintaining the main content. A summary in this article is built using syntactically correlated word combinations. In this case, the possible additional meanings of separate fragments of the text are neglected. The quality of the summary is evaluated by a matching to the source text in terms of semantics.

The main problem is split into two parts: an evaluation of the whole text semantics, without subdivision into parts, and the text transformation to derive an annotation.

The architecture of the developed system and the main algorithm are described. An example of summary derived by the system and its quality evaluation has been provided. The current version of the system has following restrictions: it does not permit any formulas and special signs.

Research target: Computer Science

Priority areas: IT and mathematics

Keywords: корпусная лингвистика corpus linguistics автоматическое аннотирование Text summarization automatic text processing

Total conditional complexity of certain objects

Vereshchagin N., Information and Computation 2026 Vol. 308 P. 1–12

The fine approach to measure information dependence is based on the total conditional complexity CT( y |x), which is defined as the minimal length of a total program that outputs y on the input x. It is known that the total conditional complexity can be much larger than the plain conditional complexity. Such strings x, y are defined ...

Added: February 14, 2026

Diffusion models for synthetic tabular data generation

Hushchyn M., Telesheva E., Doklady Mathematics 2025 No. 527 P. 388–399

he problem of generating high-quality synthetic data is crucial for many data science tasks. A generated dataset can cut the costs on the augmentation of the existing data with additional instances, for example, in physics, or help with its privacy protection, for instance, in banking. However, generating a tabular dataset is challenging, as the data ...

Added: February 12, 2026

Real-Bogus Classification for ZTF Data Releases: Two Approaches

Semenikhin n., Kornilov M., Lavrukhina A. et al., Communications in Computer and Information Science 2026 Vol. 2641 P. 211–219

We considered two fundamentally different approaches to real-bogus classification within the Zwicky Transient Facility survey data. The first approach is based on neural networks that take sequences of object images as input. The second approach uses features extracted from light curves and classical machine learning methods. Several models for both approaches were tested. Quality metrics ...

Added: February 12, 2026

Проблемы достоверности пользовательских оценок и отзывов на маркетплейсах: системный подход

Полежаева Я. В., Popov V., Бизнес-информатика 2025 Т. 19 № 24 С. 26–41

User ratings and reviews on marketplaces are subject to systematic distortions, creating serious risks for e-commerce participants and reducing the efficiency of market mechanisms. This study presents a comprehensive analysis of information distortion problems, covering the process from rating formation to its systematic accounting. The aim of the work is to systematize factors of information distortion on marketplaces and ...

Added: February 11, 2026

Development of a Language Model for Automated Classification of English-Language Scientific Articles by SRSTI Codes

Zunin V., Afonin A. I., Anoshin V. I. et al., Automatic Documentation and Mathematical Linguistics 2025 Vol. 5 No. 59 P. 287–293

The development of an artificial intelligence-based language model for classifying English-language scientific articles by SRSTI codes is described. This improves the processes of reviewing and indexing scientific publications. A pre-processed dataset of scientific articles was used for training and testing the models. An architecture for cascade classification was developed, and the performance of models with ...

Added: February 11, 2026

Generation of Synthesizable Verilog Code From Natural Language Specifications

Yashchenko D. S., Romanov A., Ziazetdinov A.A. et al., IEEE Access 2026 Vol. 14 P. 4990–5001

This study presents a method for generating synthesizable Verilog code for digital integrated circuits directly from natural-language specifications. The approach combines large language models with parameter-efficient fine-tuning (specifically, Low-Rank Adaptation and Quantized Low-Rank Adaptation) together with a specialized corpus of specification-code pairs that covers common design patterns and varying task complexity. The pipeline includes automated ...

Added: February 11, 2026

Application of MIMO technology in wideband millimeter range wireless communications systems

Tiraspolsky S.A., Ermolayev V. T., Flaksman A. G. et al., Radioelectronics and Communications Systems 2011 Vol. 54 P. 219–226

A concept of using MIMO technology in millimeter range wireless communications systems with orthogonal frequency division multiplexing is considered. The concept is based on dividing transmitting and receiving multi-element antenna arrays into separate sub-arrays with analogue radiation pattern shaping and on using two most powerful space sub-channels for information transmission. Sequence and structure of transmitted ...

Added: February 10, 2026

mmWave SVD-based beamformed MIMO communication systems

Sergey Tiraspolsky, Jeon B., Kim J. et al., Proceedings of the 7th IEEE conference on Consumer communications and networking (CCNC’2010) 2010 P. 834–838

This paper provides concept of data transmission protocol for millimeter wave (mmWave) wireless systems operating in Non-Line-of-Sight environment. This concept is designed to provide an effective and practical functioning of Multiple-Input Multiple-Output (MIMO) transmission mode that exploits combination of Singular Value Decomposition (SVD) of channel matrix and non-adaptive beamforming. The proposed protocol reduces complexity of ...

Added: February 10, 2026

Selective interference cancellation using Kalman filtering

Tiraspolsky S., Rubtsov A., Pudeyev A. et al., Proceedings of the 2006 3rd International Symposium on Wireless Communication Systems, IEEE 2006 P. 21–24

In present paper we have investigated a co-channel interference cancellation technique based on the tracking a limited number of strongest interferers only. With the assumption of synchronous base stations operation with overlapping but different training signals (pilots). Kalman filtering may be used for interfering channels estimation and further calculation of interference correlation matrix. This correlation ...

Added: February 10, 2026

Mobile WiMAX - Deployment Scenarios Performance Analysis

Tiraspolsky S., Malstev A., Rubtosv A. et al., Proceedings of the 2006 3rd International Symposium on Wireless Communication Systems, IEEE 2006 P. 353–357

In this paper, dynamic system level simulation methodology of mobile WiMAX (IEEE Std 802.16e) is described. The system level simulations scenarios (channel models, pathloss and shadow fading, sectorization, frequency reuse planning, system loading, etc) will be introduced. Evaluated performance of mobile WiMAX system such as signal-to-interference + noise ratio distributions, spectral efficiency and system outage ...

Added: February 10, 2026

Эффективность применения грассмановской диаграммообразующей схемы в MIMO системах связи

Тираспольский С.А., Червяков А. В., Труды Научной конференции по радиофизике, ННГУ, 2004 2004 С. 169–171

Диаграмообразование (ДО) в MIMO системах (multiple-input multiple-output systems), одновременно использующих несколько приемопередатчиков на обоих концах линии связи, является достаточно простым способом для повышения пропускной способности и увеличения ОСШ на приемном конце. Для этого в большинстве ранее предлагавшихся методов было необходимо знание на передатчике канальной матрицы или части ее SVD разложения, что требует значительной нагрузки на ...

Added: February 10, 2026

High-resolution capability of adaptive antenna arrays for communication systems

S.A. Tiraspolsky, Gerebryakov G. V., Журнал радиоэлектроники 2002 No. 7

In this paper we investigate comparison methods of different geometric configurations of adaptive antenna arrays for communications on purpose to estimate directions-of-arrival (DOA) of several external signals. The investigated antenna configurations have four elements and eleven wavelengths array size. The best high-resolution algorithm and the best array configuration are defined by numerical simulations. ...

Added: February 10, 2026

Применение адаптивных антенных решеток для увеличения скорости передачи информации

С.А.Тираспольский, Ермолаев В. Т., Флаксман А. Г. et al., Труды Научной конференции по радиофизике, ННГУ, 2002 2002 С. 22–28

В данной работе рассматривается принцип передачи информации и теоретически исследуется пропускная способность MIMO системы в условиях случайного канала распространения радиоволн, обсуждаются различные алгоритмы распределения мощности передатчика по параллельным ортогональным пространственным подканалам. ...

Added: February 10, 2026

Multiple adaptive recursive array for multipath environment

S. Tiraspolsky, Sellone F., Serebryakov G., Proceedings of the International Conference on Electromagnetics in Advanced Applications (ICEAA 01) 2001 P. 691–696

In a wireless communication system, signals sent into the channel interact with the environment in a very complex way. Thereby transmitted signals may be subject to many forms of degradation among which there are causes of multipath propagation: • Reflections due to obstacles with the size greater than a wavelength; • Refractions due to the ...

Added: February 10, 2026

Эффективность линейной обработки сигналов в системах связи в условиях многолучевого ионосферного канала декаметрового диапазона

Тираспольский С.А., Флаксман А. Г., Ермолаев В. Т. et al., Известия высших учебных заведений. Радиоэлектроника 2016 № 1 С. 8–14

Рассмотрены системы связи декаметрового диапазона, работающие в условиях многолучевого ионосферного пространственного канала. С помощью имитационного моделирования на физическом уровне исследованы основные характеристики системы (вероятность битовой и блоковой ошибки, про пускная способность). Показано, что в условиях частотно-селективного канала в полосе 3 кГц линейный алгоритм эквализации обеспечивает высокую эффективность подавления межсимвольной помехи для всех скоростей передачи данных, кроме самой высокой. ...

Added: February 10, 2026

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Belomestny D., Levin I., Naumov A. et al., Journal of Optimization Theory and Applications 2026 Vol. 208 Article 89

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). However, even a precise knowledge of the value function Vπ corresponding to a policy π does not provide reliable information on how far the policy π is from the optimal one. We present a novel model-free upper value iteration ...

Added: February 10, 2026

Основы компьютерной графики

Korolev D., СПб.: Лань, 2026.

Учебное пособие состоит из четырех разделов, где рассматриваются физические основы, аналого-цифровое преобразование графики, сжатие графики и видео, устройства ввода и вывода графической информации; книга повторяет структуру и содержание теоретической части курса. Основной подход —- систематизация школьных знаний и формирование целостной картины работы с графикой и видео «изнутри». На различных примерах показываются элегантные инженерные решения в ...

Added: February 7, 2026

Multimodal graph, surface, and language-based model for protein protein interaction prediction

Arteaga Moreano B. D., Poptsova M., Scientific Reports 2026 No. 16 Article 4772

Accurate prediction of protein-protein interactions (PPIs) is fundamental to understanding biological processes and disease mechanisms. While deep learning offers a powerful alternative to costly experimental methods, existing approaches often overlook critical protein-surface information and rely on simplistic feature fusion techniques, thereby limiting performance. To address this, we introduce GSMFormer-PPI, a novel multimodal framework that integrates ...

Added: February 4, 2026

Алгоритмическая сложность теорий с итерацией Клини

Kuznetsov S., Успехи математических наук 2026 Т. 81 № 1(487) С. 137–204

Итерация (звёздочка) Клини – это одна из наиболее интересных алгебраических операций, встречающихся в теоретической информатике. Исследования структур с этой операцией – алгебр Клини и их расширений – начинаются с классического понятия регулярных выражений, задающих формальные языки. Впоследствии были введены так называемые алгебры действий (В. Пратт, 1991 г.; Д. Козен, 1994 г.), или алгебры Клини с делениями. В этих структурах звёздочка Клини сочетается с делениями, согласованными с частичным порядком (такие ...

Added: February 4, 2026

Динамика восприятия площадей в пространстве города носителями русского языка (сравнительный анализ по данным НКРЯ)

Belova P., В кн.: Актуальные вопросы лингвистики и литературоведения: сборник научных статей по материалам международной научной конференции памяти доктора филологических наук, профессора Л.А. Араевой (6–8 февраля 2025).: Кемеровский государственный университет, 2025. С. 155–160.

This article contains research results on the dynamics of squares’ perception in the city space in the Russian language picture of the world over time, starting from the second half of the XXth century to the present. Turning to the subcorpus of literary texts of the second half of the XXth century and the XXIst ...

Added: February 4, 2026

Языковая концептуализация пространства в художественном тексте (по данным НКРЯ)

Belova P., В кн.: Когнитивные исследования языка. Вып. №1 (62): материалы Международной научной конференции по когнитивной лингвистике. 5-7 июня 2025. Ч. 2Ч. 2. Кн. 62. Вып. 1.: ТюмГУ-Press, 2025. С. 56–60.

Данная статья представляет результаты изучения содержания концепта ПРОСТРАНСТВО в русском языковом сознании на материале художественных прозаических текстов разных жанров, созданных во второй половине XX века и в XXI веке и представленных в НКРЯ. Анализ проведен с учетом таких культурно-языковых фильтров, как пропозициональные установки, предметно-понятийные корреляции и метафорические преобразования. ...

Added: February 4, 2026

SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval

Liakhnovich K., Lashinin O., Babkin A. et al., Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 2025 P. 2754–2758

Relevance and diversity are critical objectives in modern information retrieval (IR), particularly in recommender systems. Achieving a balance between relevance (exploitation) and diversity (exploration) optimizes user satisfaction and business goals such as catalog coverage and novelty. While existing post-processing reranking methods address this trade-off, they usually rely on greedy strategies, leading to suboptimal outcomes for ...

Added: February 3, 2026

30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework

Tutubalina E., Храбров К., Ганеева В. et al., Journal of Cheminformatics 2025 No. 17 Article 164

The recent integration of natural language processing into chemistry has advanced drug discovery. Molecule representations in language models (LMs) are crucial to enhance chemical understanding. We explored the ability of models to match the same chemical structures despite their different representations. Recognizing the same substance in different representations is an important component of emulating the ...

Added: February 3, 2026