Diagnosis of the Severity of Depression Using Speech Recording Analysis

K. Sherman; D. I. Ignatov; Tatiana I. Shishkovskaya; Maria V. Khudyakova; O. Dragoy

doi:10.1007/978-3-031-97019-1_8

Publications

?

Diagnosis of the Severity of Depression Using Speech Recording Analysis

P. 94–108.

Sherman K., Ignatov D. I., Tatiana I. Shishkovskaya, Maria V. Khudyakova, Dragoy O.

More than 3% of people worldwide experience depression. This diagnosis is established through interviews and clinical observations, which is a time- and money-demanding process. Additionally, there are a variety of symptoms associated with depression that are difficult to capture due to the limited capabilities of a human being. Many studies propose methods of automatic mental disorder recognition (MDR) using machine learning methods that are based on acoustic or linguistic feature extraction followed by a complex process of selection of the most suitable characteristics. Nevertheless, the data-collecting process is difficult; thus, the solution for MDR must be able to handle limited data and avoid complicated and uninterpretable feature engineering processes. Hereby, we propose four methods based on the fine-tuned Wav2Vec-2.0 model. These approaches overcome the mentioned limitations since this transformer model is able to capture information from both acoustic and linguistic modalities and does not require a big collection of labelled data. Moreover, three of the proposed methods are novel approaches to long audio classification problems and allow us to evaluate the capabilities of acoustic transformer models to deal with long speech recordings.

Keywords: speech classification трансформеры Transformers Mental disorder recognition классификация речи распознавание ментальных расстройств

Publication based on the results of:

Complex language and semantic models in artificial intelligence (2025)

In book

Analysis of Images, Social Networks and Texts, 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers

Vol. 15419. , Springer, 2024.

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modelling and User-Adapted Interaction 2026 Vol. 36 Article 2

Graph-based recommender systems have emerged as a powerful paradigm for personalized recommendations. However, their reliance on full model retraining to incorporate new users or new interactions creates scalability barriers. The task becomes infeasible in real-life recommender systems due to excessive time and resource costs involved. To address this limitation, we propose a fast and efficient ...

Added: March 15, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modeling and User-Adapted Interaction 2025 P. 1–24

Added: March 14, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., , in: User Modeling and User-Adapted Interaction.: Springer, 2026. Ch. 36.2 P. 1–24.

Added: January 29, 2026

Autoregressive generation strategies for Top-K sequential recommendations

Anna Volodkevich, Danil Gusak, Klenitskiy A. et al., User Modelling and User-Adapted Interaction 2025 No. 35 Article 13

The goal of modern sequential recommender systems is often formulated in terms of next-item prediction. In this paper, we explore the applicability of transformer-based generative models for the Top-K sequential recommendation task, where the goal is to predict items that a user is likely to interact with in the “near future.” This goal aligns with ...

Added: January 26, 2026

Анализ влияния обфускации входных данных на эффективность языковых моделей в обнаружении инъекции подсказок

Krokhin A., Гусев М. М., Программные системы и вычислительные методы 2025 № 2

The article addresses the issue of prompt obfuscation as a means of circumventing protective mechanisms in large language models (LLMs) designed to detect prompt injections. Prompt injections represent a method of attack in which malicious actors manipulate input data to alter the model's behavior and cause it to perform undesirable or harmful actions. Obfuscation involves ...

Added: October 4, 2025

Автоматическая саммаризация родительских чатов в WhatsApp

Dmitrieva K., Жолус М. Р., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Т. 23 № 1 С. 80–92

Automatic text summarization is one of the main tasks of natural language processing (NLP), which consists in creating a shorter version of the source text. In today’s world the amount of information consumed by people is constantly increasing, therefore more and more emphasis is being placed on the task of summarization. There are two main approaches ...

Added: July 8, 2025

OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities

Razzhigaev A., Kurkin M., Goncharova E. et al., , in: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP.: Association for Computational Linguistics, 2024. P. 183–195.

We introduce OmniDialog — the first trimodal comprehensive benchmark grounded in a knowledge graph (Wikidata) to evaluate the generalization of Large Multimodal Models (LMMs) across three modalities. Our benchmark consists of more than 4,000 dialogues, each averaging 10 turns, all annotated and cross-validated by human experts. The dialogues in our dataset are designed to prevent ...

Added: February 21, 2025

Your Transformer is Secretly Linear

Razzhigaev A., Mikhalchuk M., Goncharova E. et al., , in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024Vol. 1: Long Papers.: Bangkok: Association for Computational Linguistics, 2024. P. 5376–5384.

This paper reveals a novel linear characteristic exclusive to transformer decoders, including models like GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering an almost perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed, due to a consistently low transformer layer output ...

Added: February 17, 2025

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Razzhigaev A., Mikhalchuk M., Goncharova E. et al., , in: Findings of the Association for Computational Linguistics: EACL 2024.: Association for Computational Linguistics, 2024. P. 868–874.

Added: February 17, 2025

Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques

Gorshkov S., Ignatov D. I., Chernysheva A. et al., IEEE Access 2025 Vol. 13 P. 962–979

Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The ...

Added: January 3, 2025

Transformer-Based Classification of User Queries for Medical Consultancy

Lyutkin D. A., D. V. Pozdnyakov, Soloviev A. A. et al., Automation and Remote Control, США 2024 Vol. 85 No. 3 P. 297–308

The need for skilled medical support is growing in the era of digital healthcare. This research presents an innovative strategy, utilizing the RuBERT model, for categorizing user inquiries in the field of medical consultation with a focus on expert specialization. By harnessing the capabilities of transformers, we fine-tuned the pretrained RuBERT model on a varied ...

Added: September 26, 2024

GroundHog: Dialogue Generation using Multi-Grained Linguistic Input

Chernyavskiy A., Ostyakova L., Ilvovsky D., , in: Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024).: Association for Computational Linguistics, 2024. P. 149–160.

Recent language models have significantly boosted conversational AI by enabling fast and cost-effective response generation in dialogue systems. However, dialogue systems based on neural generative approaches often lack truthfulness, reliability, and the ability to analyze the dialogue flow needed for smooth and consistent conversations with users. To address these issues, we introduce GroundHog, a modified ...

Added: May 9, 2024

Unleashing the Power of Discourse-Enhanced Transformers for Propaganda Detection

Chernyavskiy A., Ilvovsky D., Nakov P., , in: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: (Volume 1: Long Papers).: Association for Computational Linguistics, 2024. P. 1452–1462.

Added: May 9, 2024

Grammar in Language Models: BERT Study

Chistyakova K., Kazakova Tatiana, / NRU HSE. Series WP BRP "Linguistics". 2023. No. 115.

The problem of language models’ interpretation is extensively inspected, but no universal answers have been found. Our study offers to combine widely accepted probing methods with a novel approach to a neural network under investigation. We propose to break grammatical forms on the pre-training step in order to get two "sibling" models, as it casts ...

Added: November 29, 2023

Transformer-based classification of user queries for medical consultancy with respect to expert specialization

Lyutkin D., Soloviev A., Zhukov D. et al., Working papers by Cornell University. Series math "arxiv.org" 2023 P. 1–16

Added: November 27, 2023

PaperPersiChat: Scientific Paper Discussion Chatbot using Transformers and Discourse Flow Management

Chernyavskiy A., Bregeda M., Nikiforova M., , in: Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue.: Association for Computational Linguistics, 2023. P. 584–587.

The rate of scientific publications is increasing exponentially, necessitating a significant investment of time in order to read and comprehend the most important articles. While ancillary services exist to facilitate this process, they are typically closed-model and paid services or have limited capabilities. In this paper, we present PaperPersiChat, an open chatbot-system designed for the discussion ...

Added: October 6, 2023

Transformer-based Multi-Party Conversation Generation using Dialogue Discourse Acts Planning

Alexander Chernyavskiy, Ilvovsky D., , in: Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue.: Association for Computational Linguistics, 2023. P. 519–529.

Recent transformer-based approaches to multi-party conversation generation may produce syntactically coherent but discursively inconsistent dialogues in some cases. To address this issue, we propose an approach to integrate a dialogue act planning stage into the end-to-end transformer-based generation pipeline. This approach consists of a transformer fine-tuning procedure based on linearized dialogue representations that include special ...

Added: October 6, 2023

Automated defect identification for cell phones using language context, linguistic and smoke-word models

Muhammad Z. Y., Malik M. S., Ignatov D. I., Expert Systems with Applications 2023 Vol. 227 Article 120236

Product defects are a widespread concern for manufacturers when conducting quality and customer relationship management. Prior approaches addressed many electronic products however cell phones are still unexplored. Moreover, prior work mainly focused on the lexicon, probabilistic graphic, failure mode, and effect analysis models but the utilization of word embeddings and language models are not explored. State-of-the-art contextual word embeddings and language models generate automated features and ...

Added: June 13, 2023

Big Transformers for Code Generation

Arutyunov G.A., Avdoshin S. M., Proceedings of the Institute for System Programming of the RAS 2022 Vol. 34 No. 4 P. 79–88

IT industry has been thriving over the past decades. Numerous new programming languages have emerged, new architectural patterns and software development techniques. Tools involved in the process ought to evolve as well. One of the key principles of new generation of instruments for software development would be the ability of the tools to learn using ...

Added: December 26, 2022