Structured Sparsification of Gated Recurrent Neural Networks

E. Lobacheva; N. Chirkova; Markovich A.; D. Vetrov

doi:10.1609/aaai.v34i04.5938

Publications

?

Structured Sparsification of Gated Recurrent Neural Networks

Ch. 5938. P. 4989–4996.

Lobacheva E., Chirkova N., Markovich A., Vetrov D.

Language: English

DOI

Text on another site

Keywords: deep neural networks recurrent neural networks рекуррентные нейронные сети глубинные нейронные сети

In book

Thirty-Fourth AAAI Conference on Artificial Intelligence

Vol. 34. , AAAI Press, 2020.

Black-Box Optimization with Local Generative Surrogates

Belavin V., Ustyuzhanin A., Sergey Shirobokov et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 14650–14662.

Added: February 14, 2021

Применение алгоритмов машинного обучения при решении задач информационной безопасности

Nazarov A., Виноградов Ю. В., Сычев А. К., Системы высокой доступности 2018 Т. 14 № 4 С. 20–22

The article studies the use of machine learning algorithms in solving information security problems, namely, in the construction of next-generation intrusion detection systems (IDS). The main drawbacks of traditional IDS (based on signature rules) are considered and methods for their solution are proposed using the algorithms of machine learning. The article presents new methods of ...

Added: February 26, 2019

Compression of recurrent neural networks for efficient language modeling

Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354–362

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...

Added: June 12, 2019

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

Bayesian Sparsification of Gated Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., , in: Workshop on Compact Deep Neural Network Representation with Industrial Applications, Thirty-second Conference on Neural Information Processing Systems.: Montréal: [б.и.], 2018. P. 1–6.

Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. ...

Added: December 5, 2018

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Kodryan M., Grachev A., Ignatov D. I. et al., , in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)Issue W19-43.: Association for Computational Linguistics, 2019. P. 40–48.

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in ...

Added: November 1, 2019

On Power Laws in Deep Ensembles

Lobacheva E., Chirkova N., Kodryan M. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 2375–2385.

Added: October 29, 2020

Probabilistic Neural Network With Complex Exponential Activation Functions in Image Recognition

Savchenko A., IEEE Transactions on Neural Networks and Learning Systems 2020 Vol. 31 No. 2 P. 651–660

If the training data set in image recognition task is not very large, the feature extraction with a convolutional neural network is usually applied. Here, we focus on the nonparametric classification of extracted feature vectors using the probabilistic neural network (PNN). The latter is characterized by the high runtime and memory space complexity. We propose ...

Added: November 1, 2019

Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)

[б.и.], 2018.

Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) ...

Added: October 29, 2018

UNET-BOOSTED CLASSIFIER — МУЛЬТИЗАДАЧНАЯ АРХИТЕКТУРА ДЛЯ МАЛЫХ ВЫБОРОК НА ПРИМЕРЕ КЛАССИФИКАЦИИ МРТ СНИМКОВ ГОЛОВНОГО МОЗГА.

Sobyanin K., Kulikova S., Информатика и автоматизация (Труды СПИИРАН) 2023 Т. - № - С. -–-

The problem of training deep neural networks on small samples is especially relevant for medical problems. The paper examines the impact of pixel-wise marking of significant objects in the image, over the true class label, on the quality of the classification. To achieve better classification results on small samples, we propose a multitasking architecture – ...

Added: November 9, 2023

Self-supervised recurrent depth estimation with attention mechanisms

Makarov I., Bakhanova M., Nikolenko S. et al., PeerJ Computer Science 2022 Vol. 8 Article e865

Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced ...

Added: February 1, 2022

Continuous Gesture Recognition from sEMG Sensor Data with Recurrent Neural Networks and Adversarial Domain Adaptation

Shpilman A., Sosin I., Kudenko D., , in: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV).: IEEE, 2018. P. 1436–1441.

Movement control of artificial limbs has made big advances in recent years. New sensor and control technology enhanced the functionality and usefulness of artificial limbs to the point that complex movements, such as grasping, can be performed to a limited extent. To date, the most successful results were achieved by applying recurrent neural networks (RNNs), ...

Added: January 18, 2019

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Garipov T., Izmailov P., Подоприхин Д. А. et al., , in: Advances in Neural Information Processing Systems 31 (NIPS 2018).: [б.и.], 2018. P. 1–10.

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. ...

Added: February 27, 2019

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Kopeykina Lyudmila, Savchenko A., , in: 2019 International Russian Automation Conference (RusAutoCon).: IEEE, 2019. P. 1–6.

The authors consider the problem of automatic detection of private scanned documents based on text recognition with deep neural networks. The paper suggests implementing a two-phase approach with the first stage which includes efficient EAST text detection and recognition using Tesseract OCR Engine. Secondly, the authors classify the privacy of a scanned document by deep ...

Added: October 21, 2019

On the Embeddings of Variables in Recurrent Neural Networks for Source Code

Chirkova N., , in: 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021).: Association for Computational Linguistics, 2021. P. 2679–2689.

Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics of a variable is defined not only by its name but also by the contexts in which ...

Added: August 31, 2021

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Lobacheva E., Kodryan M., Chirkova N. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 21545–21556.

Added: December 29, 2021

Voice command recognition in intelligent systems using deep neural networks

Sokolov A., Savchenko A., , in: 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2019. Ch. 19 P. 113–116.

In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending ...

Added: October 21, 2019

Bayesian Group Sparsification of Long Short-Term Memory Networks

Lobacheva E., Chirkova N., Vetrov D., /. 2018.

We propose a new Bayesian sparsification technique for gated recurrent architectures that encounters for its recurrent specifics and gated mechanism. Our method eliminates neurons from the model and makes gates constant, not only compressing the network, but also significantly accelerating a forward pass. On the discriminative tasks our method compresses LSTM extremely, so that only ...

Added: October 16, 2018

SEARNN: Training RNNs with global-local losses

Leblond R., Alayrac J., Osokin A. et al., , in: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018).: [б.и.], 2018. P. 1–16.

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an ...

Added: October 29, 2018

Gender domain adaptation for automatic speech recognition

Sokolov A., Savchenko A., , in: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2021. P. 413–418.

This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male ...

Added: September 26, 2021

Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Belomestny D., Naumov A., Puchkin N. et al., Neural Networks 2023 Vol. 161 P. 242–253

This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded ...

Added: July 13, 2022

ПРИМЕНЕНИЕ ГЛУБОКИХ НЕЙРОННЫХ СЕТЕЙ ДЛЯ КЛАССИФИКАЦИИ БОЛЬШИХ ОБЪЕМОВ АСТРОНОМИЧЕСКИХ ДАННЫХ

Gorbunov A. A., Isaev E., Samodurov V., Radio Physics and Radio Astronomy 2017 Т. 22 № 4 С. 270–275

In the process of astronomical observations are collected vast amounts of data. BSA (Big Scanning Antenna) LPI used in the study of impulse phenomena, daily logs 87.5 GB of data (32 TB per year). Experts classified 83096 individual observations (on the segment of the study July 2012 - October 2013). Over 75% of the sample ...

Added: October 15, 2017

Оптимизация физико-информированных нейронных сетей для решения нелинейного уравнения Шредингера

Чупров И. А., Гао Ц., Efremenko D. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2023 Т. 514 № 2 С. 28–38

Физико-информированные нейронные сети (Physics Informed Neural Networks – PINN) являются перспективным методом решения уравнений в частных производных с помощью машинного обучения. В работе рассмотрено применение PINN к нелинейному уравнению Шредингера для описания ...

Added: December 19, 2023

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Sokolov A., / Series Computer Science "arxiv.org". 2021.

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: November 17, 2020