The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.
We present a model for freight train time prediction based on station network analysis and specific feature engineering. We discuss the first pipeline to improve the freight flight duration prediction in Russia. While every freight company use only reference book made by RZD (Russian Railways) based on railroad distances with accuracy measured in days, we argue that one could predict the flight duration with error less than twenty hours while decreasing error to twelve hours for certain type of freight trains.
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
Proceedings of Machine Learning Research: Volume 97: International Conference on Machine Learning, 9-15 June 2019, Long Beach, California, USA
his volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics.
Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.
Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.
L’ouvrage d’Adrian Mackenzie, professeur au Département de sociologie à l’Université de Lancaster, est d’un genre inédit au sein de la littérature émergente, mais encore peu étendue en sciences humaines et sociales, qui explore le fonctionnement du machine learning (ML). Les avancées spectaculaires de cette branche de l’intelligence artificielle (IA) depuis quelques années ont éclipsé les autres approches en la matière et ont soudainement transformé l’IA en un problème social et politique. Plusieurs auteurs ont déjà insisté sur la nécessité de focaliser le regard sur les outils de l’IA, en pointant les limites des travaux qui ne traitent que des effets sociaux des « algorithmes ». Comme le fait remarquer l’anthropologue des sciences et des techniques Nick Seaver, la plupart des travaux sur le sujet s’agitent au sujet des « algorithmes » ou le « big data », en insistant sur leurs effets néfastes, voire catastrophiques, pour la société sans jamais préciser exactement ce qu’ils sont. Le transfert des connaissances et des perspectives entre les spécialistes en IA et en SHS (d’ailleurs dans les deux sens) est pourtant indispensable pour en proposer une critique informée et efficace.
A search for CP violation in the Cabibbo-suppressed D0 → K+K−π+π− decay mode is performed using an amplitude analysis. The measurement uses a sample of pp collisions recorded by the LHCb experiment during 2011 and 2012, corresponding to an integrated luminosity of 3.0 fb−1. The D0 mesons are reconstructed from semileptonic b-hadron decays into D0μ−X final states. The selected sample contains more than 160 000 signal decays, allowing the most precise amplitude modelling of this D0 decay to date. The obtained amplitude model is used to perform the search for CP violation. The result is compatible with CP symmetry, with a sensitivity ranging from 1% to 15% depending on the amplitude considered.