• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • MedSyn: LLM-based synthetic medical text generation framework
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

MedSyn: LLM-based synthetic medical text generation framework

P. 215–230.
Kumichev G., Blinov P., Kuzkina Y., Goncharov V., Zubkova G., Zenovkin N., Goncharov A., Savchenko A.

Generating synthetic text addresses the challenge of data availability in privacy-sensitive domains such as healthcare. This study explores the applicability of synthetic data in real-world medical settings. We introduce MedSyn, a novel medical text generation framework that integrates large language models with a Medical Knowledge Graph (MKG). We use MKG to sample prior medical information for the prompt and generate synthetic clinical notes with GPT-4 and fine-tuned LLaMA models. We assess the benefit of synthetic data through application in the ICD code prediction task. Our research indicates that synthetic data can increase the classification accuracy of vital and challenging
codes by up to 17.8% compared to settings without synthetic data. Furthermore, to provide new data for further research in the healthcare domain, we present the largest open-source synthetic dataset of clinical
notes for the Russian language, comprising over 41k samples covering 219 ICD-10 codes.

Language: English
DOI
Text on another site
Keywords: Synthetic dataClinical note generationICD code prediction

In book

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950
Cham: Springer, 2024.
Similar publications
Assessing the Big Data Value: Approaches and Methods
Maltseva S. V., , in: Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)Т. 1: Сборник материалов часть 1.: Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025.
Modern technological capabilities for obtaining data make them an important resource. Data analytics, development of products and services that actively use big data, implementation of the concept of data-driven organization make it necessary further development of methods for assessing the value, usefulness and cost of big data. Existing and promising methods, including the influence of ...
Added: March 3, 2026
Фундаментальная модель для временных рядов и как ее (не) обучать на синтетике
Temirkhanov A., Костромина А. М., Цымбой О. А. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 № S С. 485–494
The industry is rich in cases when we are required to make forecasting for large amounts of time series at once. However, we might be in a situation where we can not afford to train a separate model for each of them. Such issue in time series modeling remains without due attention. The remedy for ...
Added: February 24, 2026
Enhancing bankruptcy prediction efficiency using synthetic data
Elizaveta V. Lashkevich, Business Informatics 2025 Vol. 19 No. 3 P. 22–47
The firm financial insolvency prediction is crucial for investors, creditors, and regulators. However, access to high-quality, balanced data for model training is often limited due to privacy concerns, information scarcity, or financial reporting characteristics. This paper explores the potential of synthetic data generation techniques to increase minority class instances in unbalanced datasets and thereby potentially improve ...
Added: September 15, 2025
AGDES: a Python package and an approach to generating synthetic data for differential equation solving with LLMs
Vladimir Zakharov, Anton Surkov, Sergei Koltcov, Procedia Computer Science 2025 Vol. 258 P. 1169–1178
The rapid development of large language models (LLMs), including their successful application to solving mathematical problems requiring complex reasoning, presents a potential avenue for using LLMs in solving differential equations. While these equations are currently being solved successfully both numerically and via the symbolic approach, it is possible that fine-tuned LLMs, if they treat solving ...
Added: August 21, 2025
Sim4Rec: Flexible and Extensible Simulator for Recommender Systems for Large-Scale Data
Anna Volodkevich, Ivanova V., Vasilev A. et al., , in: Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part IV.: Springer, 2025. P. 425–430.
Simulators for recommender systems are widely used for recommender systems performance evaluation and feedback loop effects analysis. Existing simulators often propose inflexible pipelines, are focused on narrow research tasks, or are not adapted to work with industrial large data volumes. To address these challenges, we developed the Sim4Rec simulation framework. The Sim4Rec models key aspects ...
Added: April 10, 2025
User response modeling in recommender systems: a survey
M. Shirokikh, Shenbin I., Alekseev A. et al., Journal of Mathematical Sciences 2024 Vol. 285 No. 2 P. 255–284
Over the last several decades, recommender systems have become an integral part of both our daily lives and the research frontier at machine learning. In this survey, we explore various approaches to developing simulators for recommendation systems, especially for modeling the user response function. We consider simple probabilistic models, approaches based on generative adversarial networks, ...
Added: November 24, 2024
The Role of Synthetic Data in Improving Neural Network Algorithms
Rabchevskiy A., Leonid N. Yasnitsky, , in: 2022 4th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA).: IEEE, 2022. P. 316–312.
Abstract— This review article describes synthetic data, its applications, and examples of improving neural network algorithms with synthetic data. Using these examples, we show the important role of synthetic data in the improvement of neural network algorithms and the development of artificial intelligence ...
Added: February 15, 2024
Creating and Using Synthetic Data for Neural Network Training, Using the Creation of a Neural Network Classifier of Online Social Network User Roles as an Example
Rabchevskiy A., Yasnitsky L., , in: Digital Science: DSIC 2021Vol. 381.: Switzerland: Birkhauser/Springer, 2022. P. 412–421.
Added: February 14, 2024
Synthesis of Datasets for Neural Networks Based on Expert Knowledge
Rabchevskiy A., Ashikhmin E., Yasnitsky L., , in: Cyber-Physical Systems and Control II.: Springer, 2023. P. 535–544.
The problem of creating datasets for training and testing neural networks is described in the example of the task of social network management. A method of expert dataset synthesis based on experts’ knowledge of the subject area is proposed. The essence of the method lies in the fact that sets are generated randomly within the ...
Added: November 20, 2023
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit