• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 30, 2026
HSE Researchers Compile Scientific Database for Studying Childrens Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
April 30, 2026
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

PeerJ Computer Science, США. 2024. Vol. 10. Article e2395 .
Sergei Koltcov, Surkov A., Koltsova O., Ignatenko V.

 

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.

Research target: Computer Science Psychology
Language: English
Full text
DOI
Text on another site
Keywords: natural language inferencelarge language model (LLM)Большие языковые модели (LLMs)Zero shot classificationPsychological text dataлогический вывод на естественном языкетекстовые психологические данные
Publication based on the results of:
Modelling information and communication behaviour in computer-mediated environments and improving algorithms for behavioural data analysis (2024)
Similar publications
On the minimum number of maximal distance-k independent sets in trees
Taletskii D., / Series arXiv "math". 2026.
A vertex subset of a graph is called a \textit{distance-$k$ independent set} if the distance between any two of its distinct vertices is at least $k + 1$. For all $n,k \geq 1$, we determine the minimum possible number of inclusion-wise maximal distance-$k$ independent sets among all $n$-vertex trees. It equals~$n$ if $n \leq k ...
Added: May 1, 2026
Proceedings of the 2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)
Dayoub A., Suleiman E., IEEE, 2026.
2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) 1-3 April 2026 ...
Added: April 30, 2026
Взаимосвязь позитивной креативности и субъективного благополучия: роль базовых потребностей
Ляхова А. Г., Grigoryan E., Bultseva M. A. et al., Национальный психологический журнал 2025 Т. 21 № 2 С. 54–65
Abstract Background. Subjective well-being (SWB) is a key factor in mental health and sustainable societal development. Positive creativity (creative activity for the benefit of others) is considered as a way to enhance SWB though the mechanisms linking them remain understudied. Objectives. The aim of this study was to examine the relationship between engagement in positive creativity and ...
Added: April 30, 2026
Интеллектуальный анализ данных в нефтегазовой отрасли
М.: ООО «Геомодель Развитие», 2024.
Интелшектуальный анализ данных в нефтегазовой отрасли, Калининград, Россия, 2024, ООО «Геомодель Развитие» ...
Added: April 29, 2026
Bioinspired Method of Agent Redistribution between Groups
Karpova Irina Petrovna, Pattern Recognition and Image Analysis 2025 Vol. 35 No. 4 P. 1138–1144
A solution to the problem of redistributing agents between groups based on simulating a form of social parasitism in ants known as slave-making is considered. To provide a comprehensive solution, the problem is integrated with a method of orientation based on visual landmarks and a compass, including route memorization and return. The models and mechanisms ...
Added: April 29, 2026
Natural hazard database from Internet publications: text mining with a large language model
Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.
Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...
Added: April 28, 2026
Распределение внимания в ходе категориального научения детей с расстройствами аутистического спектра
Luzhnova K., Kotov A. A., Котова Т. Н., Клиническая и специальная психология 2026 Т. 15 № 1 С. 51–63
Context and relevance. Category learning plays a key role in cognitive development from infancy. In the process of category learning, adults tend to use selective attention, focusing on the main features of objects, while children use distributed attention, analyzing multiple features simultaneously. The use of selective attention can lead to learning difficulties in new conditions, where ...
Added: April 27, 2026
Influence of the Normal Magnetic Component to Magnetotail Current Sheet Forma
Domrin V. I., Malova H. V., V. Yu. Popov et al., Cosmic Research 2026 Vol. 64 No. 2 P. 238–252
During magnetospheric perturbations a relatively thin current sheet with thickness about several proton gyroradii forms in the Earth’s magnetotail. In a framework of the kinetic model describing current sheet thinning in the magnetotail, the processes of its formation are investigated depending on the normal magnetic field magnitude which affects both the current sheet structure and particle dynamics within ...
Added: April 27, 2026
Asymmetric Equilibrium Structures of Superthin Current Sheets: The Asymmetry of Plasma Sources
Tsareva O. O., Malova H. V., V. Yu. Popov et al., Plasma Physics Reports 2026 Vol. 52 No. 2 P. 179–185
The influence of asymmetry of plasma sources on the structure and spatial localization of a superthin current sheet (STCS) supported by demagnetized electrons is studied using a self-consistent model. The simulation takes into account the presence of a single plasma source in the northern hemisphere, which makes the plasma flow asymmetric. It is demonstrated that the asymmetry of ...
Added: April 27, 2026
Горизонты переживания утраты: сходства и различия горя и сожаления
Fam A. K., Lebedeva A., Belchenkova A., Консультативная психология и психотерапия 2026 Т. 34 № 1 С. 50–67
Context and relevance. In the field of practical psychology, there is an increase in interest and demand for new competencies in the area of problems associated with experiencing loss. There is a number of approaches for psychological support of crisis states in clients experiencing various forms of loss, but these approaches are separated from each ...
Added: April 26, 2026
Особенности идентичности и совладание с травматическим опытом у взрослых, узнавших о своем усыновлении
Чинарёва Ю. Ф., Alexandrova L., Клиническая и специальная психология 2026 Т. 15 № 1 С. 64–78
Context and relevance. Many Russian adoptive parents choose to hide the fact of adoption from their adopted children, guided by the law on secrecy of adoption in the Family Code of the Russian Federation. Moreover, according to the Criminal Code of the Russian Federation, even after reaching the age of majority and the death of ...
Added: April 25, 2026
Стигматизация людей, живущих с ВИЧ, и осведомленность о ВИЧ-инфекции среди россиян
Коренева Е. В., Zolotareva A., Клиническая и специальная психология 2026 Т. 15 № 1 С. 149–165
Context and relevance. Russian scientists are actively studying the stigmatization of people living with HIV from the perspective of HIV-positive people themselves, but there is a lack of research on the attitudes of people living with HIV in society, which could fill the gaps in understanding the nature of HIV stigma. Objective. The aim of this ...
Added: April 24, 2026
Bridging the Semantic Gap in Metadata Management using Large Language Models
Сулейкин А. С., Сорокина В., Пятецкий В. Е., , in: 2025 7th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency.: [б.и.], 2025. P. 748–753.
Effective metadata management is fundamental to data governance, ensuring that data assets are discoverable, understandable, and usable across the enterprise. However, traditional metadata systems often remain purely technical, describing structures without conveying business meaning. This disconnect — known as the semantic gap — limits the interpretability and value of metadata for business users. To address ...
Added: April 17, 2026
XXII национальная конференция по искусственному интеллекту с международным участием (КИИ-2025)
СПб.: Санкт-Петербургский Федеральный исследовательский центр РАН, 2025.
Двадцать вторая Национальная конференция по искусственному интеллекту с международным участием КИИ-2025 продолжает традицию советских (российских) конференций, организуемых Российской ассоциацией искусственного интеллекта. В первом томе трудов публикуются пленарные доклады и доклады участников конференции, представленные на следующих секциях: Секция 1 «Инженерия знаний», Секция 2 «Интеллектуальный анализ данных», Секция 3 «Моделирование рассуждений», Секция 4 «Интеллектуальный анализ текстов, большие ...
Added: February 15, 2026
Generating and Debugging Java Code using LLMs based on Associative Recurrent Memory
Василевский В. И., Alexandrov D., Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 173–182
Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. ...
Added: December 26, 2025
Разработка и интеграция AI-ассистента в систему управления обучением.
Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190
The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...
Added: December 25, 2025
Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs
David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93
Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...
Added: December 22, 2025
Искусственный интеллект как симулякр смысла
Малинов С. А., Галактика медиа: журнал медиа исследований 2025 Т. 7 № 4 С. 154–173
In recent years, artificial intelligence (AI) has been actively integrated into everyday human life. Its popularity continues to grow steadily, and companies increasingly employ AI to optimize and accelerate workflows. Ordinary users leverage large language models (LLMs) and multimodal AI systems to perform a wide range of tasks, including generating texts, images, and videos; planning ...
Added: December 7, 2025
SIGNAL: Dataset for Semantic and Inferred Grammar Neurological Analysis of Language
Komissarenko A., Voloshina E., Чевелева А. Н. et al., Scientific data 2025 Vol. 12 No. 1 Article 1687
Recently, the idea of brain-model alignment has been the topic of several influential works. However, most of previous studies were based on datasets collected during regular reading tasks where the subjects were not exposed to processing linguistic incongruencies, and stimuli were not controlled for key linguistic properties. Meanwhile, interpretability studies of Large Language Models pay ...
Added: November 18, 2025
MADD: Multi-Agent Drug Discovery Orchestra
Solovev G. V., Zhidkovskaya A. B., Orlova A. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2025.: Association for Computational Linguistics, 2025. Ch. 367 P. 6956–6998.
Hit identification is a central challenge in early drug discovery, traditionally requiring substantial experimental resources. Recent advances in artificial intelligence, particularly large language models (LLMs), have enabled virtual screening methods that reduce costs and improve efficiency. However, the growing complexity of these tools has limited their accessibility to wet-lab researchers. Multi-agent systems offer a promising ...
Added: November 16, 2025
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Sviridov I., Miftakhova A., Tereshchenko A. et al., , in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP).: Association for Computational Linguistics, 2025. Ch. 1353 P. 26625–26665.
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality ...
Added: November 16, 2025
Transformers and State-Space Models: Fine-Tuning Techniques for Solving Differential Equations
Ignatenko V., Surkov A., Zakharov V. et al., Sci 2025 Vol. 7 No. 3 Article 130
Large language models (LLMs) have recently demonstrated remarkable capabilities in natural language processing, mathematical reasoning, and code generation. However, their potential for solving differential equations—fundamental to applied mathematics, physics, and engineering—remains insufficiently explored. For the first time, we applied LLMs as translators from the textual form of an equation into the textual representation of its ...
Added: October 10, 2025
Application of Large Language Models to Solving Differential Equations: Constructing Baseline Models with LSTM and GRU
Surkov A., Zakharov V., Sergei Koltcov et al., , in: Smart Technologies, Systems and Applications: 4th International Conference, SmartTech-IC 2024, Quito, Ecuador, December 2–4, 2024, Revised Selected Papers, Part IIVol. 2: Revised Selected Papers, Part II.: Springer, 2025. P. 239–252.
Currently, large language models are actively developing and beginning to be used to solve some mathematical problems. With the emergence of xLSTM model, which demonstrates the results comparable with transformer-based models, there has been a surge of interest in recurrent neural networks. This paper considers the application of baseline recurrent models such as LSTM and ...
Added: September 11, 2025
О разработке подхода к автоматизированному сбору и интеллектуальной обработке данных с применением методов веб-скрейпинга и больших языковых моделей (на примере задачи по извлечению оценок уровней готовности технологий)
Grozovskiy F., Loginova I., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 8 С. 27–36
Предлагается подход к автоматизированному извлечению и структурированию информации из текста, сочетающий веб-скрейпинг для сбора данных из онлайн-источников и большую языковую модель для их последующей интеллектуальной обработки. В качестве объекта исследования выбраны тексты новостных публикаций об уровнях готовности технологий с сайта CNews для апробации разработанной методики в рамках конкретной предметной области. Точность выделения моделью оценок технологической ...
Added: August 11, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit