• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
July 2, 2026
Researchers Discover How Spelling Errors Slow Down Reading in Russian
Psycholinguists from the Centre for Language and Brain at HSE University–St Petersburg have shown that words that are frequently misspelled are processed more slowly by readers, even when presented with the correct spelling. The researchers confirmed this effect for the first time using Russian-language materials and found that response speed is most strongly linked to how confidently individuals can distinguish the correct spelling of a word from an incorrect one. The study has been published in The Mental Lexicon.
July 2, 2026
HSE Develops App for Assessing Phonological Processing in Children
Researchers at the HSE Centre for Language and Brain have developed a new digital tool for assessing children's phonological processing skills—the ZARYA (Sound Analysis of the Russian Language) test battery. It is the first standardised application in Russia designed to provide a fast and reliable assessment of children's ability to distinguish speech sounds, retain them in working memory, and perform phonemic analysis. The app runs on Android tablets and smartphones and is available for download from RuStore. Details of the test validation have been published in the Journal of Speech, Language, and Hearing Research.
July 1, 2026
Scientists Discover Why Europium 'Misbehaves'
Europium is a rare-earth metal responsible for the pure red glow in displays and other luminescent materials. For a long time, however, it refused to emit light when surrounded by certain organic molecules known as acylpyrazolone ligands. Chemists have now uncovered the reason: in europium complexes with these ligands, a 'black window' appears—a charge-transfer state in which the energy absorbed by the ligand is dissipated as heat rather than emitted as light. Understanding this mechanism opens the way to designing more efficient red-emitting materials for displays, fluorescent thermometers, and chemical sensors. The results have been published in Dalton Transactions.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

PeerJ Computer Science, США. 2024. Vol. 10. Article e2395 .
Sergei Koltcov, Surkov A., Koltsova O., Ignatenko V.

 

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.

Research target: Computer Science Psychology
Language: English
Full text
DOI
Text on another site
Keywords: natural language inferencelarge language model (LLM)Большие языковые модели (LLMs)Zero shot classificationPsychological text dataлогический вывод на естественном языкетекстовые психологические данные
Publication based on the results of:
Modelling information and communication behaviour in computer-mediated environments and improving algorithms for behavioural data analysis (2024)
Similar publications
Мотивация использования искусственного интеллекта: адаптация диагностического инструментария
Volkova N., Кочетков Н. В., Чикер В. А., Психологическая наука и образование 2026 Т. 31 № 3 С. 35–49
Context and relevance. Artificial intelligence is a technology with the potential to fundamentally transform all spheres of human life. Its rapid integration into everyday reality intensifies research dedicated to the psychology of using neural networks. However, the development of empirical research in the Russian scientific field is limited by the lack of validated psychodiagnostic tools that ...
Added: July 5, 2026
Журнал Телекоммуникации №1 за 2026
М.: Наука и технологии, 2026.
«Телекоммуникации» ежемесячный рецензируемый производственный, информационно-аналитический и учебно-методический журнал выходит в свет с июля 2000 г. Для руководителей и работников промышленности, научно-исследовательских и проектно-конструкторских институтов, высших учебных заведений, аспирантов и студентов, а также для специалистов, разрабатывающих, выпускающих и эксплуатирующих средства телекоммуникаций. Новости разработок и производства, прогнозы развития, защита информации, Нормативные, справочные, аналитические и учебно-методические материалы. Переход к глобальному информационному ...
Added: July 4, 2026
"Труды МФТИ" Том 17, № 4 (68) (2025)
МФТИ, 2025.
абота  редакции  научного журнала «Труды Московского физико-технического института» (кратко «Труды МФТИ»), редакционной коллегии и редакционного совета осуществляется в соответствии с Положением, утвержденным ректором института. В состав редакционной коллегии входят руководители института, факультетов, институтских и факультетских кафедр. Главный редактор журнала —президент МФТИ, член-корр. РАН Кудрявцев Н.Н.   Журнал «Труды МФТИ» входит в базу данных РИНЦ (Российский Индекс Научного Цитирования) и доступен в электронной ...
Added: July 4, 2026
Диалектика иметь и быть в психоаналитическом подходе: от истерии до психосоматики
Хилинская О. С., Leykina A., Журнал клинического и прикладного психоанализа 2026 Т. VII № 2 С. 104–121
Questions of the interrelation of manifestations played out on the psychic stage and at the level of the body have been the object of research in both psychiatry and psychoanalytic psychopathology for a long time, since the time of Hippocrates. With the advent of psychoanalysis, a revolution took place in the fi eld of psychosomatics, Freud brought something that would ...
Added: July 4, 2026
Modulation Recognition for Industrial Internet of Things Communication Signals Under Few-Shot Conditions Based on Attention Mechanism and Relation Network
Hualin M., Jie Z., Jerome Y. et al., Journal of Internet Technology 2026 Vol. 27 No. 3 P. 367–382
In open, interference-prone scenarios, the scarcity of precisely annotated signal samples limits the application of deep learning–based modulation identification, which generally relies on extensive labeled data for stability. Relation Networks, as an emerging class of deep learning models, exhibit rapid convergence in few-shot learning tasks. Motivated by the fast convergence property of relation-based learning and ...
Added: July 3, 2026
Кодовые конструкции на базе обобщенных каскадных кодов для систем связи, использующих прием на основе порядковых статистик
Osipov D., Информационно-управляющие системы 2026 № 3 С. 49–62
Introduction: In many communication systems under construction and those to be created power control and channel estimation techniques developed for the previous generation communication systems fail to provide desired precision. One way to solve this problem is to use order-statistics-based reception techniques that do not need channel estimation or power control. To ensure the desired ...
Added: July 3, 2026
Men and women are from the same planet Gender similarities in perspective-taking abilities
Imbault C., Slioussar N., Ivanenko A. et al., The Mental Lexicon 2026 P. 1–23
The study examines emotional responses to words representing a wide range of psychological valence and focuses on gender-related differences. We aimed to find out whether men and women differ in their emotional responses, and whether they can take the perspective of another gender. We used the slider paradigm (Warriner et al., 2017): participants saw a humanoid ...
Added: July 2, 2026
Возможности графической методики «Траектория» для диагностики динамики жизненного пути личности
Shilmanskaya A., Leontiev D., Культурно-историческая психология 2026 Т. 22 № 2 С. 86–97
Context and relevance. The issue of personality changes has become a significant trend in personality research over the past decade. Traditional approaches to assessing the effectiveness of psychotherapeutic work are based on comparing measurement results before and after intervention. Objective. To test the visual “Trajectory” method for assessing the perceived trajectory of individual development and to verify its ...
Added: July 1, 2026
Представления об изменении ресурсов у населения России после первых трех месяцев пандемии COVID-19
Васильчук М. С., Шаньков Ф. М., Chumakova M. et al., Psychology. Journal of the Higher School of Economics 2021 Vol. 18 No. 2 P. 247–258
The 2019 Coronavirus disease outbreak leads to negative psychological outcomes not only for healthcare workers and patients, but also for the general public. S. Hobfoll’s Conservation of Resources theory is one of the most applicable models for conceptualizing and evaluating natural and social catastrophes and their impact. A web-based screening has been conducted at the ...
Added: June 30, 2026
Категориальное научение у детей с РАС: систематический обзор
Luzhnova K., Психологические исследования: электронный научный журнал 2025 Vol. 19 No. 107 P. 1–18
This systematic review aims to synthesize and analyze current research on the characteristics of category learning in children with autism spectrum disorder (ASD) within the framework of the COVIS model, which posits competition between explicit (verbal) and implicit (nonverbal) categorization systems. The review includes 40 empirical studies published between 1981 and 2025, selected according to ...
Added: June 30, 2026
Теоретические и методологические основы когнитивно-поведенческого коучинга как научно обоснованной помогающей практики
Antonova N., Федулова Е. В., Психологические исследования: электронный научный журнал 2026 Т. 19 № 107 С.
The aim of this article is to analyze the theoretical and methodological foundations of cognitive-behavioral coaching and to identify prospects for its further development and research. The article examines the historical prerequisites for the emergence of cognitive-behavioral coaching as a helping practice. The distinctions between cognitive-behavioral coaching (CBC) and cognitive-behavioral therapy (CBT) are analyzed. The ...
Added: June 29, 2026
Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026)
Buzaev F., Mullakhmetov R., Bogachev R. et al., Association for Computational Linguistics, 2026.
Playlist generation based on textual queries using large language models (LLMs) is becoming an important interaction paradigm for music streaming platforms. User queries span a wide spectrum from highly personalized intent to essentially catalog-style requests. Existing systems typically rely on non-personalized retrieval/ranking or apply a fixed level of preference conditioning to every query, which can ...
Added: June 22, 2026
Benchmarking DNA large language models on quadruplexes
Cherednichenko O., Herbert A., Poptsova M., Computational and Structural Biotechnology Journal 2025 Vol. 27 P. 992–1000
Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and state-space models ...
Added: June 19, 2026
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
Severin N., Kartushov D., Urzhumov V. et al., , in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II. (LNCS, volume 16484).: Cham: Springer Publishing Company, 2026. P. 508–517.
Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in cap-turing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches cre-ate prohibitive inference costs in real time. To address these limitations, we present a ...
Added: June 18, 2026
ESQA: Event Sequences Question Answering
Abdullaeva I., Karpukhin I., Filatov A. et al., IEEE Access 2026 Vol. 14 P. 59390–59408
Event sequences, a specialized type of tabular data annotated with timestamps, are prevalent across practical domains such as finance, retail, social networks, and healthcare. Despite the importance of event sequence modeling and analysis, there has been little effort to adapt Large Language Models (LLMs) to this domain. In this paper, we propose a novel solution ...
Added: June 16, 2026
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
Vladimir Bogachev, Aletov V., Alexander Molozhavenko et al., , in: The Fourteenth International Conference on Learning Representations (ICLR 2026).: ICLR, 2026. Ch. 20503 P. 1–26.
This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank ...
Added: April 29, 2026
Bridging the Semantic Gap in Metadata Management using Large Language Models
Сулейкин А. С., Сорокина В., Пятецкий В. Е., , in: 2025 7th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency.: [б.и.], 2025. P. 748–753.
Effective metadata management is fundamental to data governance, ensuring that data assets are discoverable, understandable, and usable across the enterprise. However, traditional metadata systems often remain purely technical, describing structures without conveying business meaning. This disconnect — known as the semantic gap — limits the interpretability and value of metadata for business users. To address ...
Added: April 17, 2026
XXII национальная конференция по искусственному интеллекту с международным участием (КИИ-2025)
СПб.: Санкт-Петербургский Федеральный исследовательский центр РАН, 2025.
Двадцать вторая Национальная конференция по искусственному интеллекту с международным участием КИИ-2025 продолжает традицию советских (российских) конференций, организуемых Российской ассоциацией искусственного интеллекта. В первом томе трудов публикуются пленарные доклады и доклады участников конференции, представленные на следующих секциях: Секция 1 «Инженерия знаний», Секция 2 «Интеллектуальный анализ данных», Секция 3 «Моделирование рассуждений», Секция 4 «Интеллектуальный анализ текстов, большие ...
Added: February 15, 2026
Generating and Debugging Java Code using LLMs based on Associative Recurrent Memory
Василевский В. И., Alexandrov D., Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 173–182
Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. ...
Added: December 26, 2025
Разработка и интеграция AI-ассистента в систему управления обучением.
Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190
The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...
Added: December 25, 2025
Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs
David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93
Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...
Added: December 22, 2025
Искусственный интеллект как симулякр смысла
Малинов С. А., Галактика медиа: журнал медиа исследований 2025 Т. 7 № 4 С. 154–173
In recent years, artificial intelligence (AI) has been actively integrated into everyday human life. Its popularity continues to grow steadily, and companies increasingly employ AI to optimize and accelerate workflows. Ordinary users leverage large language models (LLMs) and multimodal AI systems to perform a wide range of tasks, including generating texts, images, and videos; planning ...
Added: December 7, 2025
SIGNAL: Dataset for Semantic and Inferred Grammar Neurological Analysis of Language
Komissarenko A., Voloshina E., Чевелева А. Н. et al., Scientific data 2025 Vol. 12 No. 1 Article 1687
Recently, the idea of brain-model alignment has been the topic of several influential works. However, most of previous studies were based on datasets collected during regular reading tasks where the subjects were not exposed to processing linguistic incongruencies, and stimuli were not controlled for key linguistic properties. Meanwhile, interpretability studies of Large Language Models pay ...
Added: November 18, 2025
MADD: Multi-Agent Drug Discovery Orchestra
Solovev G. V., Zhidkovskaya A. B., Orlova A. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2025.: Association for Computational Linguistics, 2025. Ch. 367 P. 6956–6998.
Hit identification is a central challenge in early drug discovery, traditionally requiring substantial experimental resources. Recent advances in artificial intelligence, particularly large language models (LLMs), have enabled virtual screening methods that reduce costs and improve efficiency. However, the growing complexity of these tools has limited their accessibility to wet-lab researchers. Multi-agent systems offer a promising ...
Added: November 16, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit