Averaging Weights Leads to Wider Optima and Better Generalization

Izmailov P.; Garipov T.; D. Vetrov; Gordon Wilson A.

?

Averaging Weights Leads to Wider Optima and Better Generalization

P. 876–885.

Izmailov P., Garipov T., Подоприхин Д. А., Vetrov D., Gordon Wilson A.

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and ShakeShake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

Language: English

Full text

Text on another site

Keywords: neural networks generalization Loss function

In book

Proceedings of the international conference on Uncertainty in Artificial Intelligence (UAI 2018)

[б.и.], 2018.

Прогнозирование котировок валютного курса евро и доллара с использованием искусственных нейронных сетей

Nazarova V., Ульзутуева Б. Д., Управление финансовыми рисками 2016 № 1 (45) С. 42–57

The first part of the issue gives general information about foreign exchange market (FOREX), review of forecasting foreign exchange rate is given. In addition we will consider the new model of nonlinear analysis to give a broader theoretical basis to the research - an artificial neural network (ANN).The nonlinear analysis and the ANN is still ...

Added: February 13, 2017

Об одной модели адаптивного управления сложными организационными структурами

Akopov A. S., Аудит и финансовый анализ 2010 № 3 С. 310–317

In work the developed model of adaptive management by the vertically integrated companies based on the system approach supporting the mechanism of an operational management in a uniform cycle of strategic planning, within the limits of faster time is presented. Thus for a finding of optimum values of operating parameters special algorithms of a class ...

Added: September 28, 2012

Моделирование урожайности зерновых культур сельскохозяйственных регионов с использованием технологий компьютерного зрения

Arkhipova M., Экономика региона 2022 Т. 18 № 2 С. 581–594

The article examines new methodologies for modelling crop yield in agricultural regions of Russia based on the use of remote capabilities to get information on the field state. The proposed approach can be applied to develop indicator systems and create methodological platforms and models necessary to obtain more accurate estimates. In comparison with the traditional ...

Added: January 12, 2023

Neural Network for Real-Time Object Detection on FPGA

Rzaev E., Khanaev A., Amerikanov A., , in: 2021 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). IEEE, 2021. P. 719–723.

Added: July 4, 2021

Segmenting Prostate Cancer on TRUS Images with a Small Dataset: A Comprehensive Methodology

Lyutkin D., Romanov A., Nasonov D., , in: 2023 International Russian Smart Industry Conference (SmartIndustryCon), 27-31 March 2023. Sochi: IEEE, 2023. P. 454–459.

The use of mathematical algorithms for disease identification has gained traction in recent years and has paved the way for the creation of novel tools that can swiftly and accurately detect pathologies. In particular, modern machine learning techniques have garnered significant attention in this domain and are currently among the most widely used algorithms. Despite ...

Added: July 30, 2023

Trusted artificial intelligence: Strengthening digital protection

Avdoshin S. M., Elena Yu. Pesotskaya, Business Informatics 2022 Vol. 16 No. 2 P. 62–73

Added: June 23, 2022

Non-invasive monitoring of blood glucose by means of wearable tracking technology

Kascheev N. I., Kozyrev O., Leykin M. et al., , in: Proceedings of XV IEEE East-West Design & Test Symposium (EWDTS'2017). Piscataway: IEEE, 2017. P. 1–4.

The secular outcome of our investigation is development of new monitoring service for glucose control related to diabetes. It is based on the main results of research: 1) New innovative wearable sensor that carry non-invasive measurement of glucose level. Sensor uses several independent technologies, simultaneously: radio-frequency with different levels of signal, ultrasonic, electromagnetic and thermal; ...

Added: February 20, 2018

Advances in Neural Computation, Machine Learning, and Cognitive Research VII

Magaj G., Soroka A., Studies in Computational Intelligence, 2023.

The basis of transfer learning methods is the ability of deep neural networks to use knowledge from one domain to learn in another domain. However, another important task is the analysis and explanation of the internal representations of deep neural networks models in the process of transfer learning. Some deep models are known to be ...

Added: October 25, 2023

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science

Springer, 2021.

This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...

Added: October 28, 2021

Сентимент частных инвесторов в объяснении различий в биржевых характеристиках акций российского рынка

Teplova T., Sokolova T., Tomtosov A. et al., Журнал Новой экономической ассоциации 2022 Т. 1 № 53 С. 53–84

Abstract. In our paper, for the first time, we examine the influence of the sentiment of private investors in social networks on the trade characteristics of stocks in the Russian market. Monthly return rates and trading volumes are analyzed under the control of financial indicators and indicators of the quality of corporate governance of stock ...

Added: April 5, 2022

[Re]“Towards Understanding Grokking”

Alexander Shabalin, Sadrtdinov I., Evgeniy Shabalin, , in: ML Reprobucibility Challenge 2022. [б.и.], 2023.

Scope of Reproducibility In this work, we attempt to reproduce the results of the NeurIPS 2022 paper "Towards Understanding Grokking: An Effective Theory of Representation Learning". This study shows that the training process can happen in four regimes: memorization, grokking, comprehension and confusion. We first try to reproduce the results on the toy example described in ...

Added: November 2, 2023

Tensorizing neural networks

Novikov A., Podoprikhin D., Osokin A. et al., , in: Advances in Neural Information Processing Systems 28 (NIPS 2015). NY: Curran Associates, 2015.

Deep neural networks currently demonstrate state-of-the-art performance in several domains.At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of ...

Added: June 9, 2016

Влияние тональности писем CEO на финансовые показатели компании

Fedorova E., Осетров Р. А., Демин И. С. et al., Российский журнал менеджмента 2017 Т. 15 № 4 С. 441–462

The paper is devoted to the analysis of CEO letters as an instrument for influencing the expectations of shareholders and potential investors. The aim of the research is to analyze empirically the influence of semantic characteristics of CEO letters on financial indicators of the company. The authors suggested that CEO letter’s tonality, its length and ...

Added: October 23, 2018

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track

PMLR, 2022.

Added: July 27, 2022

Language barriers in metaverses: the power of neural networks in translation

Osipov D., Евразийский филологический вестник 2023 No. 2 P. 21–39

The metaverse is a shared, virtual space, accessible to users worldwide, offering a platform for global interaction. The physical barriers of geographical location and time are non-existent, allowing for seamless connectivity and interaction. Language barriers within and between metaverses present substantial impediments to fluid interaction and collaboration. Failure to address this linguistic divergence can stifle ...

Added: March 19, 2024

Big Transformers for Code Generation

Arutyunov G.A., Avdoshin S. M., Proceedings of the Institute for System Programming of the RAS 2022 Vol. 34 No. 4 P. 79–88

IT industry has been thriving over the past decades. Numerous new programming languages have emerged, new architectural patterns and software development techniques. Tools involved in the process ought to evolve as well. One of the key principles of new generation of instruments for software development would be the ability of the tools to learn using ...

Added: December 26, 2022

К вопросу о структуре идеализированных когнитивных моделей в актах переноса

Pushkarev E., Вестник Южно-Уральского государственного университета. Серия: Лингвистика 2015 Т. 12 № 4 С. 56–60

The paper theorizes on the general architectonics of idealized cognitive models (ICMs) and their involvement in metonymy and metaphor. The article posits that an ICM's structure should reflect the architecture of the neural network/s engaged in processing of a given concept. The ICM nodes, or cogs, construct a complex, hierarchically organized neural connections, with the ...

Added: December 8, 2015

How to use neural network and web technologies in modeling complex technical systems

Semenenko M. G., Kniazeva I. V., Beckel L. S. et al., , in: IOP Conference Series: Materials Science and Engineering, Volume 537, Issue 3Vol. 537. Issue 3. Institute of Physics Publishing (IOP), 2019.

Added: October 20, 2021

Возникновение новых объектов правовой защиты в условиях цифровой экономики

Kirsanova E., Юрист 2018 № 11 С. 19–24

The article analyzes the legal status of information. The main interpretations of this term are discussed. The attempt is made to figure out problems of regulation of self-learning programs and to offer a classification of existing approaches to determination of their legal status. ...

Added: September 13, 2022

Artificial Intelligence in Music, Sound, Art and Design: 12th International Conference, EvoMUSART 2023, Held as Part of EvoStar 2023, Brno, Czech Republic, April 12–14, 2023, Proceedings

Cham: Springer, 2023.

This book constitutes the refereed proceedings of the 12th European Conference on Artificial Intelligence in Music, Sound, Art and Design, EvoMUSART 2023, held as part of Evo* 2023, in April 2023, co-located with the Evo* 2023 events, EvoCOP, EvoApplications, and EuroGP. The 20 full papers and 7 short papers presented in this book were carefully reviewed ...

Added: April 4, 2023

15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings

Springer, 2018.

The sixteen-volume set comprising the LNCS volumes 11205-11220 constitutes the refereed proceedings of the 15th European Conference on Computer Vision, ECCV 2018, held in Munich, Germany, in September 2018. The 776 revised papers presented were carefully reviewed and selected from 2439 submissions. The papers are organized in topical sections on learning for vision; computational photography; human analysis; ...

Added: October 30, 2018

Voting: a machine learning approach

Clemens Puppe, Burka D., Szepesváry L. et al., / Series ISSN 2190-9806 "KIT Working paper in Economics". 2020. No. 145.

Voting rules can be assessed from quite different perspectives: the axiomatic, the pragmatic, in terms of computational or conceptual simplicity, susceptibility to manipulation, and many others aspects. In this paper, we take the machine learning perspective and ask how ‘well’ a few prominent voting rules can be learned by a neural network. To address this ...

Added: October 31, 2021

Optimal decision for the market graph identification problem in a sign similarity network

Kalyagin V. A., Koldanov A. P., Koldanov P. et al., Annals of Operations Research 2018 Vol. 266 No. 1-2 P. 313–327

Research into the market graph is attracting increasing attention in stock market analysis. One of the important problems connected with the market graph is its identification from observations. The standard way of identifying the market graph is to use a simple procedure based on statistical estimations of Pearson correlations between pairs of stocks. Recently a ...

Added: May 17, 2017