On Power Laws in Deep Ensembles

E. Lobacheva; N. Chirkova; M. Kodryan; D. Vetrov

Publications

?

On Power Laws in Deep Ensembles

P. 2375–2385.

Lobacheva E., Chirkova N., Kodryan M., Vetrov D.

Language: English

Full text

Text on another site

Keywords: нейросетевые ансамбли neural network ensembles deep neural networks глубинные нейронные сети

In book

Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Curran Associates, Inc., 2020.

Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков

Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127

This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...

Added: December 12, 2025

Prediction of Industrial Cyber Attacks Using Normalizing Flows

V.P. Stepashkina, M.I. Hushchyn, Doklady Mathematics 2024 Vol. 110 No. 1 P. S95–S102

This paper presents the development and evaluation of methods for detecting cyberattacks on industrial systems using neural network approaches. The focus is on the task of detecting anomalies in multivariate time series, where the diversity and complexity of potential attack scenarios require the use of advanced models. To address these challenges, a transformer-based autoencoder architecture ...

Added: March 25, 2025

Mechanistic Permutability: Match Features Across Layers

Balagansky N., Ian Maksimov, Daniil Gavrilov, / Series Computer Science "arxiv.org". 2024.

Understanding how features evolve across layers in deep neural networks is a fundamental challenge in mechanistic interpretability, particularly due to polysemanticity and feature superposition. While Sparse Autoencoders (SAEs) have been used to extract interpretable features from individual layers, aligning these features across layers has remained an open problem. In this paper, we introduce SAE Match, ...

Added: February 20, 2025

The Appliance of Deep Neural Networks in the Process of Managing Chemical Enterprises

Kulyasova E. V., Kulyasov N.S., Puchkov A. Y., , in: Journal of Physics: Conference Series Volume 1260, 2019 Mechanical Science and Technology Update 23–24 April 2019, Omsk, Russian Federation.: IOP Publishing, 2019. Ch. 3 P. 032024–032024.

This article is introduced into the perspective tendencies of the digital transformation of chemical enterprises which allow to improve the process of managing enterprises of the branch. Presented the algorithms of managing and technological information processing based on deep neural network apparatus. New approaches to data processing known as video analytics are applied; it allows ...

Added: September 27, 2024

To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

Sadrtdinov I., Dmitrii Pozdeev, Dmitry P Vetrov et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 15936–15964.

Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train ...

Added: February 26, 2024

Оптимизация физико-информированных нейронных сетей для решения нелинейного уравнения Шредингера

Чупров И. А., Гао Ц., Efremenko D. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2023 Т. 514 № 2 С. 28–38

Физико-информированные нейронные сети (Physics Informed Neural Networks – PINN) являются перспективным методом решения уравнений в частных производных с помощью машинного обучения. В работе рассмотрено применение PINN к нелинейному уравнению Шредингера для описания ...

Added: December 19, 2023

Научная конференция ЛОМОНОСОВСКИЕ ЧТЕНИЯ. Тезисы докладов. 15-25 апреля 2019 г.сборник

Захарова Т. В., Yuzhakov T., ООО «Макс Пресс», 2019.

В настоящий сборник вошли тезисы докладов секции Вычислительной математики и кибернетики конференции «Ломоносовские чтения‑2019», проводимой Московским государственным университетом имени М. В. Ломоносова в 2019 году. ...

Added: December 13, 2023

Loss function dynamics and landscape for deep neural networks trained with quadratic loss

Nakhodnov M., Kodryan M., Lobacheva E. et al., , in: Doklady MathematicsVol. 106. Issue 1: Supplement.: Pleiades Publishing, Ltd. (Плеадес Паблишинг, Лтд), 2023. P. 43–62.

Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard ...

Added: June 9, 2023

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Kodryan M., Lobacheva E., Nakhodnov M. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 14058–14070.

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR ...

Added: December 20, 2022

Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Belomestny D., Naumov A., Puchkin N. et al., Neural Networks 2023 Vol. 161 P. 242–253

This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded ...

Added: July 13, 2022

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Lobacheva E., Kodryan M., Chirkova N. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 21545–21556.

Added: December 29, 2021

Ensemble Distribution Distillation

Malinin A., Mlodozeniec B., Gales M., , in: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020).: ICLR, 2020.

Added: November 1, 2021

Gender domain adaptation for automatic speech recognition

Sokolov A., Savchenko A., , in: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2021. P. 413–418.

This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male ...

Added: September 26, 2021

Black-Box Optimization with Local Generative Surrogates

Belavin V., Ustyuzhanin A., Sergey Shirobokov et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 14650–14662.

Added: February 14, 2021

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Sokolov A., / Series Computer Science "arxiv.org". 2021.

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: November 17, 2020

Structured Sparsification of Gated Recurrent Neural Networks

Lobacheva E., Chirkova N., Markovich A. et al., , in: Thirty-Fourth AAAI Conference on Artificial IntelligenceVol. 34.: AAAI Press, 2020. Ch. 5938 P. 4989–4996.

Added: October 29, 2020

Improving the Accuracy of One-Shot Detectors for Small Objects in X-ray Images

Demochkina P., Savchenko A., , in: Proceedings of IEEE International Russian Automation Conference (RusAutoCon 2020).: IEEE, 2020. Ch. 110 P. 610–614.

In this paper, we address the problem of detecting small objects on high-quality X-ray imagesusing deep neural networks. We propose to implement the two-stage approach, in which, firstly, input image issplit into partially overlapping blocks to make small objects more discriminative for detection. Secondly, the small blocks are fed into conventional single-shot detectors. These detectors ...

Added: October 3, 2020

Probabilistic Neural Network With Complex Exponential Activation Functions in Image Recognition

Savchenko A., IEEE Transactions on Neural Networks and Learning Systems 2020 Vol. 31 No. 2 P. 651–660

If the training data set in image recognition task is not very large, the feature extraction with a convolutional neural network is usually applied. Here, we focus on the nonparametric classification of extracted feature vectors using the probabilistic neural network (PNN). The latter is characterized by the high runtime and memory space complexity. We propose ...

Added: November 1, 2019

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Kopeykina Lyudmila, Savchenko A., , in: 2019 International Russian Automation Conference (RusAutoCon).: IEEE, 2019. P. 1–6.

The authors consider the problem of automatic detection of private scanned documents based on text recognition with deep neural networks. The paper suggests implementing a two-phase approach with the first stage which includes efficient EAST text detection and recognition using Tesseract OCR Engine. Secondly, the authors classify the privacy of a scanned document by deep ...

Added: October 21, 2019