?
Enhancing bankruptcy prediction efficiency using synthetic data
The firm financial insolvency prediction is crucial for investors, creditors, and regulators. However, access to high-quality, balanced data for model training is often limited due to privacy concerns, information scarcity, or financial reporting characteristics. This paper explores the potential of synthetic data generation techniques to increase minority class instances in unbalanced datasets and thereby potentially improve insolvency prediction models. The paper compares the performance of various imbalance reduction methods, including established methods such as, for example, the Synthetic Minority Oversampling Technique (SMOTE), with new synthetic data generation approaches based on Bayesian networks, marginal distributions, random forests, and generative adversarial networks. The performance of these methods is investigated in terms of their ability to improve classification performance such as Gini coefficient, geometric mean, false positive and false negative rate. The sample for the experiment is real financial performance of industrial SME companies in Finland for 2021. The results contribute to the growing body of knowledge on synthetic data generation and its application to address imbalanced datasets and improve predictive modelling in the financial industry and provide insights into the effectiveness of different synthetic data generation methods for sampling imbalanced datasets and improving the accuracy and reliability of firm insolvency prediction models.