Combining multiple features for single-word term extraction

Nokel M.A.; Bolshakova E.I.; Loukachevitch N.V.

?

Combining multiple features for single-word term extraction

Nokel M.A., Bolshakova E.I., Loukachevitch N.V.

The paper describes experiments on automatic single-word term extraction
based on combining various features of words, mainly linguistic and statistical,
by machine learning methods. Since single-word terms are much more
difficult to recognize than multi-word terms, a broad range of word features
was taken into account, among them are widely-known measures (such
as TF-IDF), some novel features, as well as proposed modifications of features
usually applied for multi-word term extraction.
A large target collection of Russian texts in the domain of banking was taken
for experiments. Average Precision was chosen to evaluate the results
of term extraction, along with the manually created thesaurus of terminology
on banking activity that was used to approve extracted terms.
The experiments showed that the use of multiple features significantly improves
the results of automatic extraction of domain-specific terms. It was
proved that logistic regression is the best machine learning method for single-
word term extraction; the subset of word features significant for term
extraction was also revealed.

Language: English

Text on another site

Keywords: машинное обучение machine learning Single-Word Term Extraction извлечение терминов комбинирование признаков combining features однословные термины

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 30 мая–3 июня 2012 г.). В 2 томах

Т. 2: Доклады специальных секций. Вып. 11. , М.: Российский государственный гуманитарный университет, 2012.

От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)

Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194

With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...

Added: May 8, 2026

Современные методы анализа временных рядов в мониторинге и прогнозировании состояния оборудования для механизированной добычи

Neznanov A., Glushko A., Овчинников С. et al., В кн.: Интеллектуальный анализ данных в нефтегазовой отрасли.: М.: ООО «Геомодель Развитие», 2024. С. 140–143.

With the development of monitoring systems, now we have the opportunity to collect key performance indicators of devices in the process of artificial lift. Every day a huge amount of telemetry is generated by our devices, which can be used to forecast the working mode and health state of the equipment after the process of ...

Added: April 29, 2026

Machine Learning Approach to Anticancer Activity Prediction of Transition-Metal Complexes Based on a Large-Scale Experimental Database

Krasnov L., Malikov D., Kiseleva M. et al., Journal of Medicinal Chemistry 2026 Vol. 69 No. 8 P. 8838–8851

In this work, we developed a straightforward data-driven approach to predict the cytotoxicity of metal complexes based entirely on their (metal + ligands) composition. To this end, we have manually curated MetalCytoToxDB─a comprehensive experimental database comprising 26,500 IC50 values for 7050 metal complexes against 754 cell lines from 1921 articles. Based on these, machine learning ...

Added: April 23, 2026

LSTM-модель потребления тепловой энергии в многоэтажном жилом здании

Ершов И. А., Системная инженерия и инфокоммуникации 2025 № 4 С. 11–14

The heat consumption of residential buildings is a stochastic series. It is necessary for the design of thermal energy regulators the creation of a neural network model. In the paper, the model is carried out based on Long Short-Term Memory (LSTM). The high accuracy of reproducing the series was achieved by training the model on ...

Added: April 22, 2026

Алгоритм анализа новостной информации для принятия экономических решений

Чудинова О. С., Первицкая Л. А., Ramenskaya A., Индустриальная экономика 2026 № 1 С. 65–78

This article is devoted to the development of an algorithm for analyzing news information using machine learning methods implemented in Python libraries. The choice of tools used at each stage of the algorithm is justified by calculating metrics for the quality of the solution to the corresponding machine learning problems. The algorithm’s results are presented ...

Added: April 20, 2026

Modeling cosolvent effects on solubility in supercritical CO2 using data-driven approaches

Makarov D. M., Kalikin N., Gurikov P. et al., Journal of Supercritical Fluids 2026 Vol. 235 Article 106979

Supercritical CO2 (scCO2 ) is an environmentally friendly solvent, but its low polarity limits the solubility of polar compounds. Cosolvents are commonly used to enhance solvation capability, yet comprehensive datadriven studies are scarce. We compiled the largest dataset to date — 4401 experimental solubility records with 22 cosolvents for 93 nonionic solutes, plus 4855 records ...

Added: April 19, 2026

Эффективность применения прогнозов волатильности в активных торговых стратегиях институциональных инвесторов на российском рынке акций

Lysenok N., Фундаментальная и прикладная математика 2026 Т. 26 № 3 С. 33–42

This study examines the impact of realized volatility forecasts on the performance of active trading strategies in the Russian equity market. Using a sample of 17 liquid stocks over the period 2014–2026, a hybrid forecasting model is developed that combines HAR-J with gradient boosting; its superiority over the baseline HAR-J specification is confirmed by the ...

Added: April 17, 2026

Особые экономические зоны Российской Федерации: моделирование решений потенциальных резидентов и процесса их генерации

Plesovskikh A., Journal of Applied Economic Research 2023 Т. 22 № 2 С. 323–354

Modern studies widely discuss the role of special economic zones in stimulating the economic growth and development of Russia, generating the necessary investment flows and increasing the country's innovative potential by expanding production in high-tech sectors of the economy with high added value. The purpose of the study is to model the process of generating ...

Added: April 13, 2026

Опыт генерации оценок эмоциональной валентности и возбуждения слов на основе символьно-уровневой CNN

Lyusin D., Валуева Е. А., Sysoeva T., В кн.: Психология познания: Материалы Всероссийской научной конференции, ЯрГУ, Институт психологии РАН, 5–6 декабря 2025 г.: Институт психологии РАН, 2026. С. 310–314.

Эмоциональная окраска слов широко используются в различных академических и прикладных исследованиях, от анализа текстов до понимания когнитивных процессов. Актуальной задачей является создание объёмных датасетов с оценками слов по ряду эмоциональных параметров. Современные методы машинного обучения, основанные на семантической близости слов, извлекаемой из текстовых корпусов, демонстрируют высокие корреляции с человеческими оценками, однако иногда наблюдаются существенные расхождения. ...

Added: April 10, 2026

Нейросетевые инструменты в арсенале вузовского преподавателя

Fedorov A., Вакку Г. В., Лебедева С. Э., Галактика медиа: журнал медиа исследований 2026 Т. 8 № 2 С. 163–182

With the increasing volume of data, university faculty may spend years processing and organizing information. Personalized assistance, content recommendations, data collection for literature reviews, and bibliographic citation formatting reinforce the role of artificial intelligence and neural network tools for scholarly communication. This paper discusses practical examples of using tools such as Elicit, SciSpace, Consensus, Undermind, ...

Added: April 7, 2026

Применение ML в целях повышения помехоустойчивости сигналов

Efremov A., Portnoy S., Волошин А. Д., Первая миля 2025 № 8 С. 20–28

Выполнен комплексный обзор методов машинного обучения (ML), применяемых для повышения устойчивости сигнала к помехам в каналах связи. Бурное развитие поколений беспроводной связи, активная разработка концепции 6G предъявляют высокие требования к задержке, скорости и надежности передачи данных. Традиционные подходы к защите от помех, основанные на строгих аналитических моделях, зачастую не справляются с хаотичной природой плотных гетерогенных ...

Added: April 4, 2026

Replacing Criterion of Creativity with Criterion of Investment for Results Created by Artificial Intelligence

Pakshin P., Legal Issues in the Digital Age 2026 Vol. 7 No. 1 P. 32–48

Artificial intelligence plays a significant role in automation, minimizing human intervention in fields such as medicine, art, and law. Despite the historically close relationship between art and technology, generative AI has expanded the potential for creative activity. A significant catalyst for this process has been the proliferation of pre-trained AI systems, which have accelerated the ...

Added: March 31, 2026

A Tool for Mass Generation of Random Step Environment Models with User-Defined Landscape Features

Gabdrahmanov R., Tsoy T., Martinez-Garcia E. et al., , in: Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics - (Volume 1) ICINCO 2024.: SciTePress, 2024. P. 511–518.

Computer simulations are growing in popularity in robotics research due to their near-zero cost of error and lower labor intensity. One of necessary components of a simulation, in addition to a robot model, is a model of a world in which the robot operates. While it is always possible to construct a world model manually, ...

Added: March 17, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modelling and User-Adapted Interaction 2026 Vol. 36 Article 2

Graph-based recommender systems have emerged as a powerful paradigm for personalized recommendations. However, their reliance on full model retraining to incorporate new users or new interactions creates scalability barriers. The task becomes infeasible in real-life recommender systems due to excessive time and resource costs involved. To address this limitation, we propose a fast and efficient ...

Added: March 15, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modeling and User-Adapted Interaction 2025 P. 1–24

Added: March 14, 2026

Real-Bogus Classification for ZTF Data Releases: Two Approaches

Semenikhin T., Kornilov M., Pruzhinskaya M. et al., , in: 26th International Conference, DAMDID/RCDL 2024, Nizhny Novgorod, Russia, October 23–25, 2024, Revised Selected Papers. Data Analytics and Management in Data Intensive Domains. (CCIS, volume 2641).: Springer, 2026. P. 211–219.

We considered two fundamentally different approaches to real-bogus classification within the Zwicky Transient Facility survey data. The first approach is based on neural networks that take sequences of object images as input. The second approach uses features extracted from light curves and classical machine learning methods. Several models for both approaches were tested. Quality metrics ...

Added: March 11, 2026

Кластеризация паттернов потребления электроэнергии умного дома на основе ансамблевых методов машинного обучения

Maltseva S. V., Бериков В. Б., Кладов Д. Е. et al., В кн.: Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)Т. 1: Сборник материалов часть 1.: Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025. С. 227–232.

This paper examines the problem of clustering consumption patterns for a private household. An ensemble algorithm based on the Wasserstein metric was developed and applied to cluster daily load profiles. The proposed approach allows for identifying typical energy consumption scenarios and interpreting consumer behavior. Results from computational experiments using real data are presented. ...

Added: March 3, 2026