Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features

Deeb B.; Andrey V. Savchenko; I. Makarov

doi:10.1109/ACCESS.2025.3554454

Publications

?

Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features

IEEE Access. 2026. Vol. 13. P. 56283–56295.

Deeb B., Andrey V. Savchenko, Makarov I.

Speech Emotion Recognition has gained considerable attention in speech processing and machine learning due to its potential applications in human-computer interaction, mental health monitoring, and customer service. However, state-of-the-art models for speech emotion recognition use many parameters, which leads to computational complexity. In this paper, we introduce a novel deep-learning model to enhance the accuracy of emotional content detection in speech signals while maintaining a lightweight architecture compared to state-of-the-art models. The proposed model incorporates a feature encoder that significantly improves the emotional representation of acoustic features and a cross-attention mechanism to fuse acoustic features, such as Spectrograms, with semantic features extracted from the pre-trained self-supervised learning framework, enriching the emotional representation of speech. An extensive experimental study demonstrates that the proposed model achieves a weighted accuracy of 74.6% on the IEMOCAP dataset, competitive with the state-of-the-art baselines. In addition, our proposed model achieves a latency of 24 milliseconds on moderate devices while containing up to three times fewer parameters.

Research target: Computer Science

Keywords: распознавание эмоций speech emotion recognition cross-attention mechanism механизм внимания feature fusion объединение признаков

Журнал Телекоммуникации №1 за 2026

М.: Наука и технологии, 2026.

«Телекоммуникации» ежемесячный рецензируемый производственный, информационно-аналитический и учебно-методический журнал выходит в свет с июля 2000 г. Для руководителей и работников промышленности, научно-исследовательских и проектно-конструкторских институтов, высших учебных заведений, аспирантов и студентов, а также для специалистов, разрабатывающих, выпускающих и эксплуатирующих средства телекоммуникаций. Новости разработок и производства, прогнозы развития, защита информации, Нормативные, справочные, аналитические и учебно-методические материалы. Переход к глобальному информационному ...

Added: July 4, 2026

"Труды МФТИ" Том 17, № 4 (68) (2025)

МФТИ, 2025.

абота редакции научного журнала «Труды Московского физико-технического института» (кратко «Труды МФТИ»), редакционной коллегии и редакционного совета осуществляется в соответствии с Положением, утвержденным ректором института. В состав редакционной коллегии входят руководители института, факультетов, институтских и факультетских кафедр. Главный редактор журнала —президент МФТИ, член-корр. РАН Кудрявцев Н.Н. Журнал «Труды МФТИ» входит в базу данных РИНЦ (Российский Индекс Научного Цитирования) и доступен в электронной ...

Added: July 4, 2026

Modulation Recognition for Industrial Internet of Things Communication Signals Under Few-Shot Conditions Based on Attention Mechanism and Relation Network

Hualin M., Jie Z., Jerome Y. et al., Journal of Internet Technology 2026 Vol. 27 No. 3 P. 367–382

In open, interference-prone scenarios, the scarcity of precisely annotated signal samples limits the application of deep learning–based modulation identification, which generally relies on extensive labeled data for stability. Relation Networks, as an emerging class of deep learning models, exhibit rapid convergence in few-shot learning tasks. Motivated by the fast convergence property of relation-based learning and ...

Added: July 3, 2026

Кодовые конструкции на базе обобщенных каскадных кодов для систем связи, использующих прием на основе порядковых статистик

Osipov D., Информационно-управляющие системы 2026 № 3 С. 49–62

Introduction: In many communication systems under construction and those to be created power control and channel estimation techniques developed for the previous generation communication systems fail to provide desired precision. One way to solve this problem is to use order-statistics-based reception techniques that do not need channel estimation or power control. To ensure the desired ...

Added: July 3, 2026

Graph Games and Logic Design. Recent Developments and Further Directions. (TREN, volume 66)

Springer, 2026.

This book presents established and new research on the close connections between graph games and systems of logic, particularly existing and newly designed modal logics. The volume utilizes two graph games – the sabotage game and the hide-and-seek game – to demonstrate the natural interplay between designing new graph games and exploring new kinds of ...

Added: June 30, 2026

The 12th International Conference on Information Technology and Quantitative Management (ITQM 2025)

Netherlands: ScienceDirect, 2025.

No ...

Added: June 28, 2026

Object-centric process management: A research manifesto

Seidel A., Weske M., Montali M. et al., Information Systems 2026 Vol. 141 Article 102728

Business process management employs process models and event logs to represent the behavior of the information systems under study. Traditional case-centric notions consider the order of activities and events in isolated process instances. The emerging field of object-centric processes challenges this assumption by putting objects in the center. Object-centric process mining and modeling approaches identify ...

Added: June 27, 2026

2024 26th International Conference on Digital Signal Processing and its Applications (DSPA)

IEEE, 2024.

A.S. Popov Russian Science and Technical Society with support from V. A. Trapeznikov Institute of Control Sciences, V.A. Kotelnikov Institute of Radio Engineering and Electronics, Autex Ltd. is leading the ХХVIII International Conference «Digital Signal Processing and its Applications — DSPA-2024» ...

Added: June 27, 2026

Построение методик оценки качества восприятия (QOE) потокового видео

Ivchenko A., Дворкович А. В., Телекоммуникации 2020 Т. 12 С. 2–11

Dynamic Adaptive Streaming over HTTP (DASH) technology powers most multimedia services. Its specific features (re-buffering, quality switching, etc.) necessitate the development of specialized methods for assessing user subjective quality of experience (QoE) based on objective parameters. This article examines the impact of various metrics on QoE and presents assessment models with Spearman correlation coefficients up ...

Added: June 27, 2026

Платформа, управляемая событиями, для интеграции компонентов машинного зрения с операционным центром.

Gadzhimirzaev S., Хельвас А. В., 2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET) Mohammedia, Morocco 2023 P. 1–6

The article proposes the architecture for eventdriven Emergency Operation Center with Machine Vision Component. Sources of information are analyzed and approaches to machine vision events for tactical situations detection and estimation are discussed. Messages from Machine Vision Components are converted to Common Alerting Protocol and processed by Operation Center environment for tactical situations recognition. ...

Added: June 26, 2026

Дискретное моделирование процесса восстановительного ремонта участка дороги

Gadzhimirzaev S., Хельвас А. В., Компьютерные исследования и моделирование 2022 Т. 14 № 6 С. 1255–1268

This work contains a description of the results of modeling the process of maintaining the readiness of a section of the road network under strikes of with specified parameters. A one-dimensional section of road up to 40 km long with a total number of strikes up to 100 during the work of the brigade is ...

Added: June 26, 2026

Подход к оценке динамики уровня консолидированности отрасли

Gadzhimirzaev S., Хельвас А. В., Лукьянченко П. П., Computer Research and Modeling 2023 Vol. 15 No. 1 P. 129–140

In this article we propose a new approach to the analysis of econometric industry parameters for the industry consolidation level. The research is based on the simple industry automatic control model. The state of the industry is measured by quarterly obtained econometric parameters from each industry’s company provided by the tax control regulator. An approach ...

Added: June 26, 2026

Цифровой двойник полностью автоматизированного склада с глубокими стеллажами

Gadzhimirzaev S., Хельвас А. В., International Frequency Sensor Association (IFSA) Publishing, 19-21 February 2025 Granada, Spain 2025 P. 172–176

The paper presents models for an innovative fully robotic warehouse for storing boxed goods. A discrete multiagent simulation of the movement of shuttles in a warehouse for a given sequence of pallet shipments has been implemented. Different strategies for placement of boxes in various areas of a warehouse are evaluated, as well as optimal routing ...

Added: June 26, 2026

Incorporating Scientific Knowledge into Neural Network Density Functionals

Medvedev M., Journal of Chemical Theory and Computation 2026 Vol. 22 No. 9

Density functional theory (DFT) is the workhorse of modern reactions and materials modeling. While the exact functional remains unknown, many approximations to it have been constructed either by hand-crafting functional forms to satisfy exact constraints or by machine learning. In this work, we show how both of these approaches can be fused to build both ...

Added: June 26, 2026

Моделирование полностью роботизированного склада со стеллажами глубокого хранения

A. V. Khelvas, Pankratov K. K., Afanasenko T. S. et al., Computer Research and Modeling 2026 Vol. 18 No. 2 P. 423–438

This article presents a model of a fully automated warehouse with deep storage racks designed for boxed goods storage. The study focuses on optimizing warehouse operations through discrete multiagent simulation of shuttle movements for pallet loading and unloading tasks. The authors investigate various product placement strategies, including the Nearest Channel Positioning Algorithm (NCPA), Most Empty ChannelGroup Placement (MECGP), and ...

Added: June 24, 2026

A machine learning dataset on winter roads of Krasnoyarsk Krai, Russia for the forestry and infrastructural projects

Ekaterina S. Podolskaia, Sinitsina A., European Journal of Forest Engineering 2026 Vol. 12 No. 1 P. 7–22

Machine learning in transport modeling has become a trend in science and industry. In this paper, we observe its main directions and focus on a dataset of seasonal road creation. Seasonality as a parameter in transport modeling has a significant impact on transport scenarios but is underestimated worldwide and in Russia, despite modern data challenges. ...

Added: June 24, 2026

The state and prospects of using virtual reality technologies in sports: a brief review

Atlasov B., Selskiy A., Russian Journal of Information Technology in Sports 2025 Vol. 2 No. 1 P. 13–21

The article examines the current state of the global virtual and augmented reality (VR/AR) technology market in sports, noting its growth, although slower than previously expected. Special attention is paid to the Russian market, where the development of VR technologies in sports lags behind world leaders such as the United States, EU countries and China, ...

Added: June 23, 2026

AI & PDE: ICLR 2026 Workshop on AI and Partial Differential Equations

[б.и.], 2026.

Added: June 23, 2026

Alibaba и Open Source. История и масштабы сотрудничества китайской корпорации и мира открытого кода.

Silakov D., Системный администратор 2026 № 4 С. 38–43

Alibaba Group – китайский гигант электронной коммерции – владелец маркетплейсов AliExpress, Taobao и Tmall, платежной системы AliPay, а также крупнейшего в КНР сервиса облачных вычислений – Alibaba Cloud. В последние годы внимание к компании приковано благодаря ее достижениям в области искусственного интеллекта – технологии Tongyi Qianwen и открытых моделей линейки Qwen, доступной всем желающим. Но ...

Added: June 23, 2026

2025 9th International Conference on Information, Control, and Communication Technologies (ICCT-2025)

IEEE, 2026.

The 9th International Scientific Conference on Information, Control, and Communication Technologies (ICCT-2025) had been held October 7-11, 2025 in Gomel, Belarus. The main technical areas and applications covered by the proceedings are optoelectronics, acousto-optic, microwave technology, antenna systems, measuring technology, metamaterials, nanostructures, nanofilms, photonic crystals, biology and medicine, biophotonics, bioengineering, neural networks in communication technologies; ...

Added: June 23, 2026

Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026)

Buzaev F., Mullakhmetov R., Bogachev R. et al., Association for Computational Linguistics, 2026.

Playlist generation based on textual queries using large language models (LLMs) is becoming an important interaction paradigm for music streaming platforms. User queries span a wide spectrum from highly personalized intent to essentially catalog-style requests. Existing systems typically rely on non-personalized retrieval/ranking or apply a fixed level of preference conditioning to every query, which can ...

Added: June 22, 2026

Метод распознавания сентимента и эмоций в транскрипциях русскоязычной речи с использованием машинного перевода

Dvoynikova A., Кагиров И. А., Карпов А. А., Информатика и автоматизация (Труды СПИИРАН) 2024

В статье рассматривается проблема распознавания сентимента и эмоций пользователей в русскоязычных текстовых транскрипциях речи с использованием словарных методов и машинного перевода. Количество имеющихся информационных ресурсов для анализа сентимента текстовых сообщений на русском языке очень ограничено, что существенно затрудняет применение базовых методов анализа сентимента, а именно, предобработки текстов, векторизации с помощью тональных словарей, традиционных классификаторов. Для ...

Added: April 25, 2026

Аналитический обзор многомодальных корпусов данных для распознавания эмоций

Dvoynikova A., В кн.: Альманах научных работ молодых ученых Университета ИТМО.: Университет ИТМО, 2023.

В статье раскрываются достоинства и недостатки категориальных и пространственных моделей описания эмоций. Пространственные модели позволяют охватить более широкий спектр человеческих эмоций, что позволяет разработать наиболее эффективную систему распознавания эмоций. В работе проводится аналитический обзор существующих многомодальных корпусов данных, которые имеют разметку по валентности и интенсивности эмоций. В заключении выделяется наиболее репрезентативный корпус данных для автоматического ...

Added: April 25, 2026

Подход к автоматическому распознаванию эмоций в транскрипциях речи

Dvoynikova A., Кондратенко К. О., Известия высших учебных заведений. Приборостроение 2023 Т. 66 № 10 С. 818–827

Аннотация. Исследован актуальный в различных областях вопрос распознавания эмоций в транскрипциях речи. Проанализировано влияние методов предобработки (удаление стоп-слов, лемматизация, стемминг) на точность распознавания эмоций в текстовых данных на русском и английском языках. Для проведения экспериментальных исследований использовались орфографические транскрипции диалогов из многомодальных корпусов RAMAS и CMU-MOSEI на русском и английском языке соответственно. Аннотирование этих корпусов ...

Added: April 25, 2026