Bayesian Compression for Natural Language Processing

N. Chirkova; E. Lobacheva; D. Vetrov

?

Bayesian Compression for Natural Language Processing

P. 2910-2915.

In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. We also generalize the model for vocabulary sparsification to filter out unnecessary words and compress the RNN even further. We show that the choice of the kept words is interpretable.

Language: English

Text on another site

Keywords: автоматическая обработка естественного языка deep learning глубинное обучение байесовские методы Bayesian methods Neural Language Processing (NLP)neural networks sparsification сжатие нейронных сетей

In book

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Association for Computational Linguistics, 2018

Bayesian Sparsification of Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., / International Conference on Machine Learning. Series 1 "Workshop on Learning to Generate Natural Language". 2017.

Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural ...

Added: October 19, 2017

Reflections of syntactic structures in nonautoregressive language models

Плетенев С. А., В кн. : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 16–19 июня 2021 г.). Issue 20.: Russian State University for the Humanitie, 2021.

Added: December 13, 2021

Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network

Petrosyan A., Voskoboynikov A., Sukhinin D. et al., Journal of Neural Engineering 2022 Vol. 19 No. 6 Article 066016

Objective. Speech decoding, one of the most intriguing brain-computer interface applications, opens up plentiful opportunities from rehabilitation of patients to direct and seamless communication between human species. Typical solutions rely on invasive recordings with a large number of distributed electrodes implanted through craniotomy. Here we explored the possibility of creating speech prosthesis in a minimally ...

Added: December 9, 2022

A hybrid lemmatiser for Old Church Slavonic

Afanasev I., / НИУ ВШЭ. Series WP BRP "Linguistics". 2021.

The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a short description of previous attempts and current trends in lemmatisation. The lemmatiser is hybrid-based and uses the advantages ...

Added: December 28, 2021

Salience models: a computational cognitive neuroscience review

Krasovskaya S., MacInnes W., Vision 2019 Vol. 3 No. 4 P. 1-24

The seminal model by Laurent Itti and Cristoph Koch demonstrated that we can compute the entire flow of visual processing from input to resulting fixations. Despite many replications and follow-ups, few have matched the impact of the original model - so what made this model so groundbreaking? We have selected five key contributions that distinguish ...

Added: October 13, 2019

Spatially Adaptive Computation Time for Residual Networks

Figurnov M., Collins M., Zhu Y. et al., / Cornell University. Series arXiv "arXiv:1612.02297". 2016.

This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and ...

Added: December 12, 2016

Интерфейс мозг-компьютер: опыт построения, использования и возможные пути повышения рабочих характеристик

Volkova K., Dagaev N., Киселёв А. С. et al., Журнал высшей нервной деятельности им. И.П. Павлова 2017 Т. 67 № 4 С. 504-520

Brain-computer interfaces find application in a number of different areas and have the potential to be used for research as well as for practical purposes. The clinical use of BCI includes current studies on neurorehabilitation ([Frolov et al., 2013; Ang et al., 2010]), and there is the prospect of using BCI to restore movement and ...

Added: October 19, 2017

Deep Learning Neural Networks as a Model of Saccadic Generation

Krasovskaya S., Zhulikov G., MacInnes W., The Russian Journal of Cognitive Science 2019 P. 1-10

Approximately twenty years ago, Laurent Itti and Christof Koch created a model of saliency in visual attention in an attempt to recreate the work of biological pyramidal neurons by mimicking neurons with centre-surround receptive fields. The Saliency Model has launched many studies that contributed to the understanding of layers of vision and the sphere of ...

Added: October 21, 2019

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Ryabinin M., Gorbunov E., Plokhotnyuk V. et al., , in : Advances in Neural Information Processing Systems 34 (NeurIPS 2021). : Curran Associates, Inc., 2021. P. 18195-18211.

Added: February 1, 2022

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

DeepZ: A Deep Learning Approach for Z-DNA Prediction

Beknazarov N., , in : Z-DNA: Methods and Protocols. : United States of America : Springer, 2023. P. 217-226.

Here we describe an approach that uses deep learning neural networks such as CNN and RNN to aggregate information from DNA sequence; physical, chemical, and structural properties of nucleotides; and omics data on histone modifications, methylation, chromatin accessibility, and transcription factor binding sites and data from other available NGS experiments. We explain how with the ...

Added: December 26, 2023

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Association for Computational Linguistics, 2018

Added: September 5, 2018

Unsupervised Domain Adaptation Methods for Cross-Species Transfer of Regulatory Code Signals

Latyshev P. N., Pavlov F., Frontiers in Big Data 2023 P. 1-10

Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and lacking for many species of interest. Deep learning methods became the state-of the art computational methods but are often species-specific, reflecting the data used to train them. Here we take ...

Added: January 11, 2023

Институциональные возможности государств в сравнительной перспективе: опыт байесовского агрегирования государственной состоятельности

Gorelskiy I., Сравнительная политика 2022 Т. 13 № 3 С. 53-73

This article endeavors to construct a composite indicator designed to facilitate the comparative assessment of institutional capacities across diverse political systems. The focal point of analysis resides within the domain of state capacity, a pivotal determinant for a myriad of inquiries that seek to evaluate the efficacy of public policy implementation across varying spheres. The ...

Added: January 23, 2024

Bayesian Group Sparsification of Long Short-Term Memory Networks

Lobacheva E., Chirkova N., Vetrov D., / undefined. 2018.

We propose a new Bayesian sparsification technique for gated recurrent architectures that encounters for its recurrent specifics and gated mechanism. Our method eliminates neurons from the model and makes gates constant, not only compressing the network, but also significantly accelerating a forward pass. On the discriminative tasks our method compresses LSTM extremely, so that only ...

Added: October 16, 2018

Эмоциональный анализ постов в ВКонтакте: классификатор или регрессор

Kolmogorova A., Калинин А. А., В кн. : Компьютерная лингвистика и интеллектуальные технологии: по материалам международной конференции «Диалог 2022», выпуск 21. Вып. 21.: Изд-во РГГУ, 2022. С. 311-322.

The article summarizes the results of two tasks in machine learning paradigm: the task of classification according to the criterion of dominating emotion on the data of social networks posts in Russian and the regression task using the same data. The experiments are conducted on the data set collected from VKontakte social network and consisted of 3879 posts ...

Added: March 18, 2024

Double-Blind Peer-Reviewing and Inclusiveness in Russian NLP Conferences

Kutuzov A. B., Никишина И. А., , in : Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected Papers. Vol. 11832.: Cham : Springer, 2019. P. 3-8.

Double-blind peer reviewing has been proved to be a pretty effective and fair way of academic work selection. However, to the best of our knowledge, nobody has yet analysed the effects caused by its introduction at the Russian NLP conferences. We investigate how the double-blind peer reviewing influences gender and location (according to authors’ affiliations) ...

Added: January 20, 2020

Black-Box Optimization with Local Generative Surrogates

Belavin V., Ustyuzhanin A., Широбоков С. К. et al., Proceedings of Machine Learning Research 2020 P. 1-9

We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use ...

Added: October 31, 2019

The Deep Weight Prior

Atanov A., Ashukha A., Struminsky K. et al., , in : Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). : ICLR, 2019. P. 1-17.

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of ...

Added: September 2, 2019

fMRI from EEG is only Deep Learning away: the use of interpretable DL to unravel EEG-fMRI relationships

Ossadtchi A., Mikheev I., Ковалев А. В., Working papers by Cornell University. Series cond-mat.soft "arxiv.org" ( 2022 Article 4650840

The access to activity of subcortical structures offers unique opportunity for building intention dependent brain-computer interfaces, renders abundant options for exploring a broad range of cognitive phenomena in the realm of affective neuroscience including complex decision making processes and the eternal free-will dilemma and facilitates diagnostics of a range of neurological deceases. So far this ...

Added: December 16, 2022

Workshop on Compact Deep Neural Network Representation with Industrial Applications, Thirty-second Conference on Neural Information Processing Systems

Montréal : [б.и.], 2018

This workshop aims to bring together researchers, educators, practitioners who are interested in techniques as well as applications of making compact and efficient neural network representations. One main theme of the workshop discussion is to build up consensus in this rapidly developed field, and in particular, to establish close connection between researchers in Machine Learning ...

Added: December 5, 2018

Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition

Savchenko A., Optical Memory and Neural Networks (Information Optics) 2017 Vol. 26 No. 2 P. 129-136

We analyzed the way to increase computational efficiency of video-based image recognition methods with matching of high dimensional feature vectors extracted by deep convolutional neural networks. We proposed an algorithm for approximate nearest neighbor search. At the first step, for a given video frame the algorithm verifies a reference image obtained when recognizing the previous ...

Added: June 30, 2017

Определение заболеваний маниока методами компьютерного зрения

Терещенко С. Н., Perov A., Осипов А. Л., Siberian Journal of Life Sciences and Agriculture 2021 Т. 13 № 1 С. 144-155

Background. Development of a convolutional neural network model for detecting cassava diseases from a mobile phone photo. Materials and methods. The material for the research was taken images with various types of cassava diseases, published in open access of the Kaggle platform. Research methods: theory of design and development of information systems, programming, methods of augmentation and extension ...

Added: November 17, 2021

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin et al., , in : Advances in Neural Information Processing Systems 34 (NeurIPS 2021). : Curran Associates, Inc., 2021. P. 10999-11011.

Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to define a latent variable as a perturbed algorithm output and to use a differentiable surrogate for training. In general, the surrogate puts additional constraints ...

Added: March 14, 2022