• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Enhancing bankruptcy prediction efficiency using synthetic data
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Enhancing bankruptcy prediction efficiency using synthetic data

Business Informatics. 2025. Vol. 19. No. 3. P. 22–47.
Elizaveta V. Lashkevich

The firm financial insolvency prediction is crucial for investors, creditors, and regulators. However, access to high-quality, balanced data for model training is often limited due to privacy concerns, information scarcity, or financial reporting characteristics. This paper explores the potential of synthetic data generation techniques to increase minority class instances in unbalanced datasets and thereby potentially improve insolvency prediction models. The paper compares the performance of various imbalance reduction methods, including established methods such as, for example, the Synthetic Minority Oversampling Technique (SMOTE), with new synthetic data generation approaches based on Bayesian networks, marginal distributions, random forests, and generative adversarial networks. The performance of these methods is investigated in terms of their ability to improve classification performance such as Gini coefficient, geometric mean, false positive and false negative rate. The sample for the experiment is real financial performance of industrial SME companies in Finland for 2021. The results contribute to the growing body of knowledge on synthetic data generation and its application to address imbalanced datasets and improve predictive modelling in the financial industry and provide insights into the effectiveness of different synthetic data generation methods for sampling imbalanced datasets and improving the accuracy and reliability of firm insolvency prediction models.

Language: English
DOI
Text on another site
Keywords: financial insolvencySynthetic dataclass imbalanceдисбаланс классов
Similar publications
Автоматическое определение эмоционального состояния участников предметных разговоров по транскрипциям речи
Dvoynikova A., Мамонтов Д. Ю., Карпов А. А., В кн.: Альманах научных работ молодых ученых Университета ИТМОТ. 3.: Университет ИТМО, 2021. С. 63–68.
В работе проводятся экспериментальные исследования по определению уровня эмоциональных проявлений в текстовых транскрипциях базы данных K-EmoCon. Рассматривается влияние сбалансирования классов при обучении классификаторов на точность определения эмоций. В статье устанавливается базовый стандарт результатов по классификации уровня эмоций дикторов в текстовых транскрипциях. ...
Added: April 24, 2026
Assessing the Big Data Value: Approaches and Methods
Maltseva S. V., , in: Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)Т. 1: Сборник материалов часть 1.: Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025.
Modern technological capabilities for obtaining data make them an important resource. Data analytics, development of products and services that actively use big data, implementation of the concept of data-driven organization make it necessary further development of methods for assessing the value, usefulness and cost of big data. Existing and promising methods, including the influence of ...
Added: March 3, 2026
Фундаментальная модель для временных рядов и как ее (не) обучать на синтетике
Temirkhanov A., Костромина А. М., Цымбой О. А. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 № S С. 485–494
The industry is rich in cases when we are required to make forecasting for large amounts of time series at once. However, we might be in a situation where we can not afford to train a separate model for each of them. Such issue in time series modeling remains without due attention. The remedy for ...
Added: February 24, 2026
AGDES: a Python package and an approach to generating synthetic data for differential equation solving with LLMs
Vladimir Zakharov, Anton Surkov, Sergei Koltcov, Procedia Computer Science 2025 Vol. 258 P. 1169–1178
The rapid development of large language models (LLMs), including their successful application to solving mathematical problems requiring complex reasoning, presents a potential avenue for using LLMs in solving differential equations. While these equations are currently being solved successfully both numerically and via the symbolic approach, it is possible that fine-tuned LLMs, if they treat solving ...
Added: August 21, 2025
Sim4Rec: Flexible and Extensible Simulator for Recommender Systems for Large-Scale Data
Anna Volodkevich, Ivanova V., Vasilev A. et al., , in: Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part IV.: Springer, 2025. P. 425–430.
Simulators for recommender systems are widely used for recommender systems performance evaluation and feedback loop effects analysis. Existing simulators often propose inflexible pipelines, are focused on narrow research tasks, or are not adapted to work with industrial large data volumes. To address these challenges, we developed the Sim4Rec simulation framework. The Sim4Rec models key aspects ...
Added: April 10, 2025
User response modeling in recommender systems: a survey
M. Shirokikh, Shenbin I., Alekseev A. et al., Journal of Mathematical Sciences 2024 Vol. 285 No. 2 P. 255–284
Over the last several decades, recommender systems have become an integral part of both our daily lives and the research frontier at machine learning. In this survey, we explore various approaches to developing simulators for recommendation systems, especially for modeling the user response function. We consider simple probabilistic models, approaches based on generative adversarial networks, ...
Added: November 24, 2024
MedSyn: LLM-based synthetic medical text generation framework
Kumichev G., Blinov P., Kuzkina Y. et al., , in: Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950.: Cham: Springer, 2024. P. 215–230.
Generating synthetic text addresses the challenge of data availability in privacy-sensitive domains such as healthcare. This study explores the applicability of synthetic data in real-world medical settings. We introduce MedSyn, a novel medical text generation framework that integrates large language models with a Medical Knowledge Graph (MKG). We use MKG to sample prior medical information for the prompt and generate synthetic ...
Added: November 22, 2024
The Role of Synthetic Data in Improving Neural Network Algorithms
Rabchevskiy A., Leonid N. Yasnitsky, , in: 2022 4th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA).: IEEE, 2022. P. 316–312.
Abstract— This review article describes synthetic data, its applications, and examples of improving neural network algorithms with synthetic data. Using these examples, we show the important role of synthetic data in the improvement of neural network algorithms and the development of artificial intelligence ...
Added: February 15, 2024
Creating and Using Synthetic Data for Neural Network Training, Using the Creation of a Neural Network Classifier of Online Social Network User Roles as an Example
Rabchevskiy A., Yasnitsky L., , in: Digital Science: DSIC 2021Vol. 381.: Switzerland: Birkhauser/Springer, 2022. P. 412–421.
Added: February 14, 2024
Исследование применения методов машинного обучения в задаче выявления мошеннических действий в отношении клиентов банка при подтверждении операции
Шелепова А. Н., Vorobyev I., В кн.: Межвузовская научно-техническая конференция студентов, аспирантов и молодых специалистов им. Е.В. Арменского 2023.: МИЭМ НИУ ВШЭ, 2023. С. 289–292.
На сегодняшний день выявление мошенничества в банковской сфере значительно затруднено из-за применения злоумышленниками методов социальной инженерии. Мошенники обманывают клиентов и убеждают перевести денежные средства на свои счета под различными предлогами. В целях противодействия угрозе банки блокируют операции и обращаются к клиенту для дополнительного подтверждения. Находясь под психологическим воздействием злоумышленников, клиенты подтверждают операции, несмотря на предупреждения ...
Added: February 13, 2024
Synthesis of Datasets for Neural Networks Based on Expert Knowledge
Rabchevskiy A., Ashikhmin E., Yasnitsky L., , in: Cyber-Physical Systems and Control II.: Springer, 2023. P. 535–544.
The problem of creating datasets for training and testing neural networks is described in the example of the task of social network management. A method of expert dataset synthesis based on experts’ knowledge of the subject area is proposed. The essence of the method lies in the fact that sets are generated randomly within the ...
Added: November 20, 2023
МОДЕЛИ ПРОГНОЗИРОВАНИЯ ВЕРОЯТНОСТИ БАНКРОТСТВА И ВОЗМОЖНОСТИ ИХ ПРИМЕНЕНИЯ ДЛЯ СТРОИТЕЛЬНЫХ КОМПАНИЙ
Voyko A. V., Учет. Анализ. Аудит 2021 Т. 8 № 1 С. 13–23
The paper examines some foreign and domestic methods of forecasting bankruptcy of enterprises in order to apply them in the largest construction organizations in Russia. The empirical basis of the study is the construction companies that are comparable in size, revenue, and market share. Their annual financial statements preceding the analysis are the information base ...
Added: November 2, 2021
Качество риск-менеджмента в банке: предпосылки возникновения финансовых проблем
Khasyanova S. Y., Цыганова В. В., Российский журнал менеджмента 2018 Т. 16 № 2 С. 187–204
The significant decrease in the number of banks in the Russian Federation observed recently and arising high social costs of liquidation and sanitation procedures underpin the need for continuous improvement of early-warning systems of bankruptcy. The aim of the article is to identify the key leading indicators of financial insolvency of banks. The study was ...
Added: October 9, 2018
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit