Variational Dropout Sparsifies Deep Neural Networks
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
The disintegration of the Soviet Union is an essential case for the study of ethnic politics and identity-based mobilization. However, analyses in this article demonstrate that commonly used measures of ethnic diversity and politically relevant group concentration show little consistent relationship with events of ethnic mobilization in Soviet regions during the period 1987-1992. In contrast, the proportion of a regional population that did not speak a metropolitan language has a consistently strong negative relationship with mobilization across these regions. In line with recent work on identity politics, I argue that a lack of proficiency in a metropolitan language marks nonspeakers as outsiders and hinders their social mobility. Regions with many of these individuals thus have a relatively high potential for identity-based mobilization. These findings provide further impetus for looking beyond ethnic groups in measuring identity-based cleavages, and indicate that language can play an important role in political outcomes aside from proxying ethnicity.
The Varieties of Democracy (V-Dem) project relies on country experts who code a host of ordinal variables, providing subjective ratings of latent|that is, not directly observable regime characteristics over time. Sets of around ve experts rate each case (country-year observation), and each of these raters works independently. Since raters may diverge in their coding because of either differences of opinion or mistakes, we require systematic tools with which to model these patterns of disagreement. These tools allow us to aggregate ratings into point estimates of latent concepts and quantify our uncertainty around these point estimates. In this chapter we describe item response theory models that can that account and adjust for differential item functioning (i.e. differences in how experts apply ordinal scales to cases) and variation in rater reliability (i.e. random error). We also discuss key challenges specic to applying item response theory to expert-coded cross-national panel data, explain the approaches that we use to address these challenges, highlight potential problems with our current framework, and describe long-term plans for improving our models and estimates. Finally, we provide an overview of the different forms in which we present model output.
This study is devoted to different types of students’ behavior before they drop an adaptive course. The Adaptive Python course at the Stepik educational platform was selected as the case for this study. Student behavior was measured by the following variables: number of attempts for the last lesson, last three lessons solving rate, the logarithm of normed solving time, the percentage of easy and difficult lessons, the number of passed lessons, and total solving time. We applied a standard clustering technique, K-means, to identify student behavior patterns. To determine optimal number of clusters, the silhouette metrics was used. As the result, three types of dropout were identified: “solved lessons”, “evaluated lessons as hard’’, and “evaluated lessons as easy”.
Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.
Models for converting expert-coded data to estimates of latent concepts assume different data-generating processes (DGPs). In this paper, we simulate ecologically valid data according to different assumptions, and examine the degree to which common methods for aggregating expert-coded data (1) recover true values and (2) construct appropriate coverage intervals. We find that the mean and both hierarchical Aldrich–McKelvey (A–M) scaling and hierarchical item-response theory (IRT) models perform similarly when expert error is low; the hierarchical latent variable models (A-M and IRT) outperform the mean when expert error is high. Hierarchical A–M and IRT models generally perform similarly, although IRT models are often more likely to include true values within their coverage intervals. The median and non-hierarchical latent variable models perform poorly under most assumed DGPs.