Speaker-Aware Training of Speech Emotion Classifier with Speaker Recognition

Speech Emotion Recognition has gained considerable attention in speech processing and machine learning due to its potential applications in human-computer interaction, mental health monitoring, and customer service. However, state-of-the-art models for speech emotion recognition use many parameters, which leads to computational complexity. In this paper, we introduce a novel deep-learning model to enhance the accuracy ...

Added: June 16, 2026

A Bimodal Approach for Speech Emotion Recognition using Audio and Text

Verkholyak O., Dvoynikova A., Karpov A., Journal of Internet Services and Information Security 2021 No. 1 P. 80–96

This paper presents a novel bimodal speech emotion recognition system based on analysis of acoustic and linguistic information. We propose a novel decision-level fusion strategy that leverages both emotions and sentiments extracted from audio and text transcriptions of extemporaneous speech utterances. We perform experimental study to prove the effectiveness of the proposed methods using emotional ...

Added: April 24, 2026

CA-SER: Cross-Attention Feature Fusion for Speech Emotion Recognition

Deeb B., Savchenko A., Makarov I., , in: ECAI 2024. 27th European Conference on Artificial Intelligence, October 19 – 24 October 2024, Santiago de Compostela, Spain – Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024).: IOS Press, 2024. P. 4479–4482.

In this paper, we introduce a novel tool for speech emotion recognition, CA-SER, that borrows self-supervised learning to extract semantic speech representations from a pre-trained wav2vec 2.0 model and combine them with spectral audio features to improve speech emotion recognition. Our approach involves a self-attention encoder on MFCC features to capture meaningful patterns in audio ...

Added: February 15, 2025

Распознавание выражений лиц на основе адаптации классификатора видеоданных пользователя

Churaev E., Savchenko A., Компьютерная оптика 2023 Т. 47 № 5 С. 806–815

In this paper, an approach that can significantly increase the accuracy of facial emotion recogni- tion by adapting the model to the emotions of a particular user (e.g., smartphone owner) is consid- ered. At the first stage, a neural network model, which was previously trained to recognize facial expressions in static photos, is used to ...

Added: May 18, 2023

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Sokolov A., / Series Computer Science "arxiv.org". 2021.

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: November 17, 2020