A Simple Baseline for Bayesian Uncertainty in Deep Learning

Maddox W.; Izmailov P.; Garipov T.; D. Vetrov; Gordon Wilson A.

?

A Simple Baseline for Bayesian Uncertainty in Deep Learning

P. 13153–13164.

Maddox W., Izmailov P., Garipov T., Vetrov D., Gordon Wilson A.

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

Language: English

Text on another site

Keywords: deep learning Bayesian framework

In book

Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

[б.и.], 2019.

Method of Critical Set construction for Successive Cancellation List Decoder of Polar Codes Based on Deep Learning of Neural Networks

Котов Ф. И., Timokhin I., Ivanov F., , in: 2023 XVIII International Symposium Problems of Redundancy in Information and Control Systems (REDUNDANCY).: IEEE, 2023.

The Successive Cancellation List (SCL) algorithm is a widely used decoding technique in communication systems. However, constructing the critical set for SCL decoding is a challenging task, as it requires a large number of computations and can lead to significant decoding delays. In this paper, a new approach to critical set construction for SCL decoding ...

Added: January 26, 2026

Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V

Cham: Springer, 2025.

This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025. The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...

Added: September 29, 2025

Deep learning deciphers the related role of master regulators and G-quadruplexes in tissue specification

Artem B., Andreasyan A., Konovalov D. et al., Scientific Reports 2025 Vol. 15 Article 23119

G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for the genome-wide G-flipon predictions across 14 human tissue types. The model was trained using high-confidence experimental maps of GQ-forming sequences ...

Added: August 8, 2025

AI in drug development: advances in response, combination therapy, repositioning, and molecular design

Shaitan A., Science China Information Sciences 2025 Vol. 68 No. 7 Article 170102

Artificial intelligence (AI) is revolutionizing the field of drug development, particularly in addressing key challenges such as drug response prediction, drug combination design, drug repositioning, and drug molecule generation. Traditional drug discovery is hindered by long timelines, high costs, and low success rates, necessitating innovative technologies to accelerate the process. AI technologies, such as deep ...

Added: June 25, 2025

An Approach to Finding a Robust Deep Learning Model

Boldyrev A., Ratnikov F., Shevelev A., IEEE Access 2025 Vol. 13 P. 102390–102406

The rapid development of machine learning (ML) and artificial intelligence (AI) applications requires the training of a large numbers of models. This growing demand highlights the importance of training models without human supervision, while ensuring that their predictions are reliable. In response to this need, we propose a novel approach for determining model robustness. This approach, supplemented with a ...

Added: June 15, 2025

Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта

Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74

Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...

Added: June 5, 2025

Deep learning for customs classification of goods based on their textual descriptions analysis

Ryzhova A., Sochenkov I., , in: Proceeding 2019 Ivannikov Ispras Open Conference (ISPRAS).: IEEE Computer Society, 2019. P. 60–67.

Added: May 1, 2025

Distilling Normalizing Flows

Walton S., Klyukin V., Artemev M. et al., , in: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).: IEEE, 2025. P. 3328–3337.

Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and ...

Added: April 1, 2025

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Derkach D., Artemev M., IEEE, 2025.

Added: April 1, 2025

Deep learning captures the effect of epistasis in multifactorial diseases

Perelygin V., Kamelin A., Syzrantsev N. et al., Frontiers in Medicine 2025 Vol. 11 Article 1479717

Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular ...

Added: March 4, 2025

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Ivan Rubachev, Nikolay Kartashev, Gorishniy Y. et al., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 53831–53867.

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical deployment. In this work, we analyze existing tabular deep learning benchmarks and find two common characteristics of tabular data ...

Added: March 1, 2025

Weight Perturbations for Simulating Virtual Lesions in a Convolutional Neural Network

W. Joseph MacInnes, Zhozhikashvili N., Feurra M., , in: First International Conference, AIiH 2024, Swansea, UK, September 4–6, 2024, Proceedings, Part II. Artificial Intelligence in Healthcare. LNCS, volume 14976Vol. 14976.: Springer, 2024. P. 221–234.

Convolutional Neural Networks (CNNs) match human performance in many visual tasks like the classification of images, however they may not simulate the underlying biological processes. We implemented a CNN to try replicate results from an object inversion experiment with Transcranial Magnetic Stimulation (TMS). After training on upright faces, the CNN model went through three stages ...

Added: January 28, 2025

TabR: Tabular Deep Learning Meets Nearest Neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev et al., , in: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024).: ICLR, 2024.

Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves ...

Added: January 22, 2025

Deep Learning Approaches for LHCb ECAL Reconstruction

Boldyrev A., Derkach D., Ratnikov F. et al., EPJ Web of Conferences 2024 Vol. 295 Article 09008

Calorimeters are a crucial component for most detectors mounted on modern colliders. Their tasks include identifying and measuring the energy of photons and neutral hadrons, recording energetic hadronic jets, and contributing to the identification of electrons, muons, and charged hadrons. To fulfill these many tasks while keeping costs reasonable, the calorimeter construction requires good and ...

Added: January 8, 2025

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Gorishniy Y., Kotelnikov A., Babenko A., , in: The Thirteenth International Conference on Learning Representations: ICLR 2025.: ICLR, 2025.

Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for substantially improving tabular MLPs: namely, parameter-efficient ensembling -- a paradigm for implementing an ensemble of models as one model producing multiple predictions. We ...

Added: December 24, 2024

Может ли искусственный интеллект прогнозировать решения суда? Систематический обзор международных исследований

Kazun A., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 5 С. 100–122

Advancements in artificial intelligence technologies and the emergence of open databases containing judicial decisions have led to rapid improvements in algorithms capable of classifying legal documents and forecasting decisions made by judges. This article examines a body of international research dedicated to the question of how accurately AI can predict judges’ decisions, and consequently, whether ...

Added: November 29, 2024

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950

Cham: Springer, 2024.

This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...

Added: November 22, 2024

Unet-boosted classifier – мультизадачная архитектура для малых выборок на примере классификации МРТ снимков головного мозга

Sobyanin K., Kulikova S., Информатика и автоматизация (Труды СПИИРАН) 2024 Т. 23 № 4 С. 1022–1046

The problem of training deep neural networks on small samples is especially relevant for medical problems. The paper examines the impact of pixel-wise marking of significant objects in the image, over the true class label, on the quality of the classification. To achieve better classification results on small samples, we propose a multitasking architecture -- ...

Added: June 29, 2024

Generative Flow Networks as Entropy-Regularized RL

Tiapkin D., Morozov N., Naumov A. et al., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. P. 4213–4221.

The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to ...

Added: June 22, 2024

Controlling Quality for a Physics-Driven Generative Models and Auxiliary Regression Approach

Rogachev A., Ratnikov F., EPJ Web of Conferences 2024 Vol. 295 Article 09007

High energy physics experiments heavily rely on the results of MC simulation of data used to extract physics results. However, the detailed simulation often requires tremendous amount of computation resources. Using Generative Adversarial Networks and other deep learning generative techniques can drastically speed up the computationally heavy simulations like a simulation of the calorimeter response. To ...

Added: May 20, 2024