• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 19, 2026
HSE Researchers Determine Which Internet Users Are More Likely to Fact-Check
Researchers at HSE University examined the strategies employed by Russian internet users to verify unreliable information and the factors that motivate them to do so. The study found that more than half of users who encounter potentially false information online attempt to verify it by locating the original source. The likelihood of fact-checking is influenced by several factors, including age, place of residence, social status, information literacy skills, and the use of AI. The findings have been published in Monitoring of Public Opinion: Economic and Social Changes.
June 5, 2026
'Im Used to Producing Distilled Knowledge'
Ivan Rubachev works in a HSE University laboratory established jointly with Yandex Research, where he focuses on machine learning with tabular data. In this interview with the HSE Young Scientists project, he discusses why following a vibe can be better than goal-setting, explains the concept of the Neural Turing Machine, and argues why withholding scientific knowledge is counterproductive.
June 17, 2026
Population Lifespan Is Governed by Mathematical Laws
Researchers at HSE University and MSU have established a universal law governing the time to extinction of a population in a random environment. Their analysis of the evolution of branching processes—complex probabilistic systems—shows that, regardless of the initial population size, extinction follows strict mathematical laws. The results have been published in the Journal of Applied Probability.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

PeerJ Computer Science, США. 2024. Vol. 10. Article e2395 .
Sergei Koltcov, Surkov A., Koltsova O., Ignatenko V.

 

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.

Research target: Computer Science Psychology
Language: English
Full text
DOI
Text on another site
Keywords: natural language inferencelarge language model (LLM)Большие языковые модели (LLMs)Zero shot classificationPsychological text dataлогический вывод на естественном языкетекстовые психологические данные
Publication based on the results of:
Modelling information and communication behaviour in computer-mediated environments and improving algorithms for behavioural data analysis (2024)
Similar publications
Особенности краткосрочной групповой терапии со взрослыми, страдающими паническими расстройствами в человекоцентрированном подходе
Zirko A., Ребенок Д. С., Ежегодник по клиентоцентрированной психотерапии и человекоцентрированному подходу, Россия 2026 Т. 6 С. 113–120
The article considers the problem of panic disorders in adulthood. While the number of people with panic disorders is growing, the short-term person-centered encounter group psychotherapy form is suggested. It’s peculiarities and facilitator’s role are described. The preliminary recommendations for such group facilitators are given. The probable group and individual dynamics of the participants is ...
Added: June 19, 2026
Types of Vocalizations in Self-expression and Self-inquiry
Zirko A., Psychology. Journal of the Higher School of Economics 2021 Vol. 18 No. 1 P. 224–239
The author discusses vocalizations as using non-verbal voice sounds in self-expression and selfinquiry. The purpose of the study was to investigate the experience of self-expression and selfinquiry through vocalizations in the situations of valuing and evaluating. The researcher hypothesized that placing an individual in a safe place for self-expression on the conditions of valuing creates ...
Added: June 18, 2026
Графовые паттерны в несогласованных декларативных моделях процессов
Анненков А. Н., Nesterov R., Моделирование и анализ информационных систем 2026 Т. 33 № 2 С. 176–205
Declarative process models are widely used in process mining to describe flexible process behavior through sets of constraints. However, models discovered automatically from event logs may contain inconsistent constraints, which can make them difficult to interpret and unusable for execution, conformance checking, or further analysis. Existing methods for consistency analysis either rely on automata-based constructions ...
Added: June 18, 2026
Pre-trained LLMs Meet Sequential Recommenders: Efficient U ser-CentricKnowledgeDistillation
Severin N., Kartushov D., Urzhumov V. et al., , in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II.: Cham: Springer Publishing Company, 2026. P. 508–517.
Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in cap-turing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches cre-ate prohibitive inference costs in real time. To address these limitations, we present a ...
Added: June 18, 2026
Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II
Cham: Springer Publishing Company, 2026.
The four-volume set LNCS 16483-16486 constitutes the refereed conference proceedings of the 48th European Conference on Information Retrieval, ECIR 2026, held in Delft, The Netherlands, during March 29–April 2, 2026. The 46 full papers and 37 short papers presented together with 10 findings papers, 9 reproducibility papers, 17 resource papers, 11 workshop papers, 7 tutorial papers, ...
Added: June 18, 2026
Искусственный интеллект как роза научной деятельности: исследование Тимоти Гауэрса
Poddiakov A., Троицкий вариант. Наука 2026 № 12 С. 24–25
В научно-популярной заметке представлен обзор содержания поста филдсовского медалиста Тимоти Гауэрса о возможностях ИИ в математике и содержания комментариев под постом. Обзор сделан в основном чат-ботом DeepSeek. В заключение обсуждается возможность не только решения задач искусственным интеллектом, но и их постановки. ...
Added: June 18, 2026
Contribution of attentional mechanisms to verbal and nonverbal communication in boys with ASD
Minnigulova A., Dragoy O., Arutiunian V., European Child and Adolescent Psychiatry 2026
Communication deficits in Autism Spectrum Disorder (ASD) involve impairments in both verbal and nonverbal domains, potentially associated with altered brain network connectivity related to language, attention, and social cognition systems. This study investigated functional connectivity patterns among the Default Mode Network (DMN), Salience Network (SN), Dorsal Attention Network (DAN), and Language Network (LN) in male children ...
Added: June 18, 2026
Natural Science Perspectives in the Psychology of Religion: From Early Biological Interpretations to Contemporary Cognitive, Evolutionary, and Neuroscientific Approaches
Dvoinin A., Natural Systems of Mind 2025 Vol. 5 No. 4 P. 4–21
Background and Problem. The psychology of religion has traversed a long and non-linear historical path, experiencing an initial flourishing (1880–1920), a subsequent decline, and a significant resurgence since the 1990s. Despite this renewed growth, driven by advances in genetics, neuroscience, and cognitive science, a comprehensive overview of the development of natural science perspectives within the field—from their origins ...
Added: June 17, 2026
Exploring New Frontiers in Vertical Federated Learning: the Role of Saddle Point Reformulation
Beznosikov A., Kormakov G., Grigorievskiy A. et al., Journal of Optimization Theory and Applications 2026 Vol. 209 Article 18
The objective of Vertical Federated Learning (VFL) is to collectively train a model using features available on different devices while sharing the same users. This paper focuses on the saddle point reformulation of the VFL problem via the classical Lagrangian function. We first demonstrate how this formulation can be solved using deterministic methods.More importantly, we explore various stochastic modifications to ...
Added: June 17, 2026
Supervised Learning in Critical Phenomena—Statistical and Systematic Accuracy
Chertenkov V. I., Shchur L., Lobachevskii Journal of Mathematics 2026 Vol. 47 No. 2 P. 720–727
Supervised machine learning is successfully applied to the study of critical phenomena and allows us to obtain a numerical estimate of the phase transition temperature and the correlation length exponent. We discuss the influence of possible systematic errors, as well as statistical errors, on the accuracy of such numerical estimates. Errors in the training and ...
Added: June 16, 2026
Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features
Deeb B., Andrey V. Savchenko, Makarov I., IEEE Access 2026 Vol. 13 P. 56283–56295
Speech Emotion Recognition has gained considerable attention in speech processing and machine learning due to its potential applications in human-computer interaction, mental health monitoring, and customer service. However, state-of-the-art models for speech emotion recognition use many parameters, which leads to computational complexity. In this paper, we introduce a novel deep-learning model to enhance the accuracy ...
Added: June 16, 2026
Automated detection of wolf howls using audio spectrogram transformers
Makarov N., Savchenko A., Zemtsova I. et al., Scientific Reports 2025 Vol. 15 Article 26641
The grey wolf (Canis lupus) is a pivotal species for ecological studies. As a key participant in ecosystem processes, it also serves as a model for investigating social structure formation and ecological adaptation. However, the species’ complex social behavior, spatial dynamics, and expansive habitats make monitoring and population assessments across large areas particularly challenging. In recent years, audio traps ...
Added: June 16, 2026
Artificial intelligence framework for multi-pathology risk assessment from retinal fundus images: deep learning approach to 15-disease screening
Vasilev R., Savchenko A., Blinov P. et al., Frontiers in Medicine 2026 Vol. 13
Automated disease screening systems face challenges when applied to multi-class medical image analysis, particularly under severe class imbalance inherent in clinical datasets. Retinal fundus imaging enables non-invasive screening for multiple ocular and systemic diseases simultaneously, yet current automated systems typically assess risk for only a single pathology or a limited disease range. We developed a ...
Added: June 16, 2026
From Data to Signs: A Foundation Model for Multilingual Sign Language Recognition
Novopoltsev M., Tulenkov A., Murtazin R. et al., IEEE Access 2025 Vol. 13 P. 188170–188181
Video-based Isolated Sign Language Recognition (ISLR) problem presents significant challenges in scaling across diverse languages due to data scarcity and the computational costs associated with training of language-specific models. In this paper, we introduce a novel training pipeline that leverages self-supervised learning on a large-scale sign language dataset. To obtain the foundation model, we utilize ...
Added: June 16, 2026
B3Emo: Quantifying Affect as a Double-Edged Sword in Strategic LLM Interactions
Stepin A., Mozikov M., Kabanov A. et al., IEEE Access 2026 Vol. 14 P. 48127–48144
The deployment of large language models (LLMs) in interactive roles such as automated negotiators, customer service agents, and strategic partners requires them to handle not only logical tasks but also the socio-emotional dimensions of interaction. In these situations, success often relies on understanding social cues, building trust, and using persuasion effectively. These skills are closely ...
Added: June 16, 2026
ESQA: Event Sequences Question Answering
Abdullaeva I., Karpukhin I., Filatov A. et al., IEEE Access 2026 Vol. 14 P. 59390–59408
Event sequences, a specialized type of tabular data annotated with timestamps, are prevalent across practical domains such as finance, retail, social networks, and healthcare. Despite the importance of event sequence modeling and analysis, there has been little effort to adapt Large Language Models (LLMs) to this domain. In this paper, we propose a novel solution ...
Added: June 16, 2026
Особенности когнитивных искажений в онлайн- и офлайн-среде: систематический обзор литературы
Соколовская В. В., Кроколева С. С., Samoilov O. et al., Психология человека в образовании 2026 Т. 8 № 1 С. 39–56
Introduction. The perception of information is influenced by active digitalization, making it increasingly important to study thinking and factors affecting it. This systematic review examines the psychological study of cognitive biases (cognitive distortions) and aims to systematize accumulated knowledge about cognitive biases in offline and online environments. Materials and Methods. The systematic literature review was conducted using electronic ...
Added: June 14, 2026
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Association for Computational Linguistics, 2026.
Added: June 14, 2026
Bridging the Semantic Gap in Metadata Management using Large Language Models
Сулейкин А. С., Сорокина В., Пятецкий В. Е., , in: 2025 7th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency.: [б.и.], 2025. P. 748–753.
Effective metadata management is fundamental to data governance, ensuring that data assets are discoverable, understandable, and usable across the enterprise. However, traditional metadata systems often remain purely technical, describing structures without conveying business meaning. This disconnect — known as the semantic gap — limits the interpretability and value of metadata for business users. To address ...
Added: April 17, 2026
XXII национальная конференция по искусственному интеллекту с международным участием (КИИ-2025)
СПб.: Санкт-Петербургский Федеральный исследовательский центр РАН, 2025.
Двадцать вторая Национальная конференция по искусственному интеллекту с международным участием КИИ-2025 продолжает традицию советских (российских) конференций, организуемых Российской ассоциацией искусственного интеллекта. В первом томе трудов публикуются пленарные доклады и доклады участников конференции, представленные на следующих секциях: Секция 1 «Инженерия знаний», Секция 2 «Интеллектуальный анализ данных», Секция 3 «Моделирование рассуждений», Секция 4 «Интеллектуальный анализ текстов, большие ...
Added: February 15, 2026
Generating and Debugging Java Code using LLMs based on Associative Recurrent Memory
Василевский В. И., Alexandrov D., Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 173–182
Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. ...
Added: December 26, 2025
Разработка и интеграция AI-ассистента в систему управления обучением.
Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190
The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...
Added: December 25, 2025
Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs
David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93
Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...
Added: December 22, 2025
Искусственный интеллект как симулякр смысла
Малинов С. А., Галактика медиа: журнал медиа исследований 2025 Т. 7 № 4 С. 154–173
In recent years, artificial intelligence (AI) has been actively integrated into everyday human life. Its popularity continues to grow steadily, and companies increasingly employ AI to optimize and accelerate workflows. Ordinary users leverage large language models (LLMs) and multimodal AI systems to perform a wide range of tasks, including generating texts, images, and videos; planning ...
Added: December 7, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit