Variational Dropout Sparsifies Deep Neural Networks

D. Molchanov; A. Ashukha; D. Vetrov

?

Variational Dropout Sparsifies Deep Neural Networks

P. 2498–2507.

Molchanov D., Ashukha A., Vetrov D.

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

Language: English

Full text

Text on another site

Keywords: dropout Bayesian methods variational dropout

Publication based on the results of:

Разработка комбинированных нейробайесовских методов машинного обучения (2017)

In book

Proceedings of Machine Learning Research. Proceedings of the International Conference on Machine Learning (ICML 2017)

Vol. 70. , Sydney: [б.и.], 2017

The V--Dem measurement model: Latent variable analysis for cross-national and cross-temporal expert-coded data

Pemstein D., Marquardt K., Tzelgov E. et al., / SSRN. Series V-Dem Institute "Working Paper". 2019. No. 21.

Added: July 26, 2019

Stress testing as a tool for monitoring and modelling the dynamics of business activity of manufacturing enterprises in Russia in the face of market shocks: short-term scenarios of industry tendencies

Lola I. S., Manukov A., Bakeev M., / Высшая школа экономики. Series WP BRP "Science, Technology and Innovation". 2020. No. 108.

The article proposes a methodology for using macro-level stress testing based on the results of business tendency surveys to study possible scenarios for the development of crisis dynamics triggered by external unforeseen supply and demand shocks, as in the case of the COVID-19 pandemic, as well as a review of existing approaches in the field ...

Added: May 28, 2020

Variational Dropout via Empirical Bayes

Kharitonov V., Molchanov D., Vetrov D., / Cornell University. Series arxiv.org "stat.ML". 2018.

We study the Automatic Relevance Determination procedure applied to deep neural networks. We show that ARD applied to Bayesian DNNs with Gaussian approximate posterior distributions leads to a variational bound similar to that of variational dropout, and in the case of a fixed dropout rate, objectives are exactly the same. Experimental results show that the ...

Added: November 27, 2018

Bayesian Compression for Natural Language Processing

Chirkova N., Lobacheva E., Vetrov D., , in : Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. : Association for Computational Linguistics, 2018. P. 2910–2915.

In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows ...

Added: September 5, 2018

Language, Ethnicity, and Separatism: Survey Results from Two Post-Soviet Regions

Marquardt K., British Journal of Political Science 2022 Vol. 52 No. 4 P. 1831–1851

Scholars often use language to proxy ethnic identity in studies of conflict and separatism. This conflation of language and ethnicity is misleading: language can cut across ethnic divides and itself has a strong link to identity and social mobility. Language can therefore influence political preferences independently of ethnicity. Results from an original survey of two ...

Added: December 27, 2021

Estimating latent traits from expert surveys: an analysis of sensitivity to data-generating process

Marquardt K., Pemstein D., POLITICAL SCIENCE RESEARCH AND METHODS 2021 P. 1–10

Models for converting expert-coded data to estimates of latent concepts assume different data-generating processes (DGPs). In this paper, we simulate ecologically valid data according to different assumptions, and examine the degree to which common methods for aggregating expert-coded data (1) recover true values and (2) construct appropriate coverage intervals. We find that the mean and ...

Added: July 15, 2021

Types of Dropout in Adaptive Open Online Courses

Skryabin M., , in : Lecture Notes in Computer Science. Vol. 10254: Digital Education: Out to the World and Back to the Campus.: Springer, 2017. P. 273–279.

This study is devoted to different types of students’ behavior before they drop an adaptive course. The Adaptive Python course at the Stepik educational platform was selected as the case for this study. Student behavior was measured by the following variables: number of attempts for the last lesson, last three lessons solving rate, the logarithm ...

Added: March 3, 2019

Bayesian Group Sparsification of Long Short-Term Memory Networks

Lobacheva E., Chirkova N., Vetrov D., / undefined. 2018.

We propose a new Bayesian sparsification technique for gated recurrent architectures that encounters for its recurrent specifics and gated mechanism. Our method eliminates neurons from the model and makes gates constant, not only compressing the network, but also significantly accelerating a forward pass. On the discriminative tasks our method compresses LSTM extremely, so that only ...

Added: October 16, 2018

What makes experts reliable? Expert reliability and the estimation of latent traits

Marquardt K., Pemstein D., Seim B. et al., Research and Politics 2019 Vol. 6 No. 4 P. 1–8

Experts code latent quantities for many influential political science datasets. Although scholars are aware of the importance of accounting for variation in expert reliability when aggregating such data, they have not systematically explored either the factors affecting expert reliability or the degree to which these factors influence estimates of latent concepts. Here we provide a ...

Added: October 9, 2019

Успешность аспирантов в области социальных наук: роль научного руководителя

Grigoreva A., Журнал социологии и социальной антропологии 2021 Т. 24 № 4 С. 90–109

Высокий уровень отсева из аспирантуры и низкий процент защит в течение нормативного срока обучения (8,9 % в 2020 г.) — важная проблема об- разовательной политики в России. Российские аспиранты все реже делают выбор в пользу академической профессии и карьеры. Одной из ключевых причин высо- кого уровня отсева является недостаточная академическая интеграция аспирантов и недостаток университетских ...

Added: December 6, 2021

Межрегиональные эффекты инноваций в России: анализ с позиций байесовского подхода

Терещенко Д. С., Пространственная экономика 2024 Т. 20 № 1 С. 125–143

This study analyzes the interregional effects of innovation in Russia. The hypothesis of the presence of interregional effects is tested by combining the methods of spatial econometrics and Bayesian approach. Using panel data on Russian regions for the period from 2000 to 2021, the author calculates posterior probabilities for a set of spatial regression models ...

Added: August 28, 2024

Identity, social mobility and ethnic mobilization: Language and the disintegration of the Soviet Union

Marquardt K., Comparative Political Studies 2018 Vol. 51 No. 7 P. 831–867

The disintegration of the Soviet Union is an essential case for the study of ethnic politics and identity-based mobilization. However, analyses in this article demonstrate that commonly used measures of ethnic diversity and politically relevant group concentration show little consistent relationship with events of ethnic mobilization in Soviet regions during the period 1987-1992. In contrast, ...

Added: July 25, 2019

IRT models for expert-coded panel data

Marquardt K., Pemstein D., Political Analysis 2018 Vol. 26 No. 4 P. 431–456

Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide ...

Added: July 25, 2019

Институциональные возможности государств в сравнительной перспективе: опыт байесовского агрегирования государственной состоятельности

Gorelskiy I., Сравнительная политика 2022 Т. 13 № 3 С. 53–73

This article endeavors to construct a composite indicator designed to facilitate the comparative assessment of institutional capacities across diverse political systems. The focal point of analysis resides within the domain of state capacity, a pivotal determinant for a myriad of inquiries that seek to evaluate the efficacy of public policy implementation across varying spheres. The ...

Added: January 23, 2024

Bayesian Sparsification of Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., / International Conference on Machine Learning. Series 1 "Workshop on Learning to Generate Natural Language". 2017.

Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural ...

Added: October 19, 2017

The Deep Weight Prior

Atanov A., Ashukha A., Struminsky K. et al., , in : Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). : ICLR, 2019. P. 1–17.

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of ...

Added: September 2, 2019

Xenophobia on the rise? Temporal and regional trends in xenophobic attitudes in Russia

Chapman H., Marquardt K., Herrera Y., Comparative Politics 2018 Vol. 50 No. 3 P. 381–394

In this article we consider the trajectory of xenophobia in Russia since the disintegration of the Soviet Union. Using survey data from 1996, 2004, and 2012, we examine Russians' negative attitudes toward seven outgroups over time. We also statistically analyze the degree to which correlates of xenophobia have changed between 1996 and 2012. We find ...

Added: July 25, 2019

The Measurement Model and Reliability

Pemstein D., Marquardt K., Seim B. et al., , in : Varieties of Democracy: Measuring Two Centuries of Political Change. : Cambridge University Press, 2020. Ch. 4. P. 66–89.

The Varieties of Democracy (V-Dem) project relies on country experts who code a host of ordinal variables, providing subjective ratings of latent|that is, not directly observable regime characteristics over time. Sets of around ve experts rate each case (country-year observation), and each of these raters works independently. Since raters may diverge in their coding because ...

Added: December 13, 2020