Variational Dropout Sparsifies Deep Neural Networks
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
The disintegration of the Soviet Union is an essential case for the study of ethnic politics and identity-based mobilization. However, analyses in this article demonstrate that commonly used measures of ethnic diversity and politically relevant group concentration show little consistent relationship with events of ethnic mobilization in Soviet regions during the period 1987-1992. In contrast, the proportion of a regional population that did not speak a metropolitan language has a consistently strong negative relationship with mobilization across these regions. In line with recent work on identity politics, I argue that a lack of proficiency in a metropolitan language marks nonspeakers as outsiders and hinders their social mobility. Regions with many of these individuals thus have a relatively high potential for identity-based mobilization. These findings provide further impetus for looking beyond ethnic groups in measuring identity-based cleavages, and indicate that language can play an important role in political outcomes aside from proxying ethnicity.
This study is devoted to different types of students’ behavior before they drop an adaptive course. The Adaptive Python course at the Stepik educational platform was selected as the case for this study. Student behavior was measured by the following variables: number of attempts for the last lesson, last three lessons solving rate, the logarithm of normed solving time, the percentage of easy and difficult lessons, the number of passed lessons, and total solving time. We applied a standard clustering technique, K-means, to identify student behavior patterns. To determine optimal number of clusters, the silhouette metrics was used. As the result, three types of dropout were identified: “solved lessons”, “evaluated lessons as hard’’, and “evaluated lessons as easy”.
Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.
Experts code latent quantities for many influential political science datasets. Although scholars are aware of the importance of accounting for variation in expert reliability when aggregating such data, they have not systematically explored either the factors affecting expert reliability or the degree to which these factors influence estimates of latent concepts. Here we provide a template for examining potential correlates of expert reliability, using coder-level data for six randomly selected variables from a cross-national panel dataset. We aggregate these data with an ordinal item response theory model that parameterizes expert reliability, and regress the resulting reliability estimates on both expert demographic characteristics and measures of their coding behavior. We find little evidence of a consistent substantial relationship between most expert characteristics and reliability, and these null results extend to potentially problematic sources of bias in estimates, such as gender. The exceptions to these results are intuitive, and provide baseline guidance for expert recruitment and retention in future expert coding projects: attentive and confident experts who have contextual knowledge tend to be more reliable. Taken as a whole, these findings reinforce arguments that item response theory models are a relatively safe method for aggregating expert-coded data.