?
Bayesian Sparsification of Recurrent Neural Networks
P. 1-8.
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural networks. To account for recurrent specifics we also rely on Binary Variational Dropout for RNN (Gal & Ghahramani, 2016b). We report 99.5% sparsity level on sentiment analysis task without a quality drop and up to 87% sparsity level on language modeling task with slight loss of accuracy.
Language:
English
Keywords: recurrent neural networks
Kodryan M., Grachev A., Ignatov D. I. et al., , in : Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). Issue W19-43.: Association for Computational Linguistics, 2019. P. 40-48.
Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in ...
Added: November 1, 2019
Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354-362
Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...
Added: June 12, 2019
Lobacheva E., Chirkova N., Markovich A. et al., , in : Thirty-Fourth AAAI Conference on Artificial Intelligence. Vol. 34.: AAAI Press, 2020. Ch. 5938. P. 4989-4996.
Added: October 29, 2020
Makarov I., Bakhanova M., Nikolenko S. et al., PeerJ Computer Science 2022 Vol. 8 Article e865
Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced ...
Added: February 1, 2022
Lobacheva E., Chirkova N., Vetrov D., / undefined. 2018.
We propose a new Bayesian sparsification technique for gated recurrent architectures that encounters for its recurrent specifics and gated mechanism. Our method eliminates neurons from the model and makes gates constant, not only compressing the network, but also significantly accelerating a forward pass. On the discriminative tasks our method compresses LSTM extremely, so that only ...
Added: October 16, 2018
Lobacheva E., Chirkova N., Vetrov D., , in : Workshop on Compact Deep Neural Network Representation with Industrial Applications, Thirty-second Conference on Neural Information Processing Systems. : Montréal : [б.и.], 2018. P. 1-6.
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. ...
Added: December 5, 2018
Leblond R., Alayrac J., Osokin A. et al., , in : Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). : [б.и.], 2018. P. 1-16.
We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an ...
Added: October 29, 2018
Chirkova N., , in : 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021). : Association for Computational Linguistics, 2021. P. 2679-2689.
Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics of a variable is defined not only by its name but also by the contexts in which ...
Added: August 31, 2021
Arefyev, N.V., Gratsianova T. Y., Popov K., , in : Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings. : M. : Conference Proceedings Editorial board, 2018. P. 85-95.
Morphological segmentation is an important task of natural language processing as it can significantly improve the processing of unfamiliar and rare words in different tasks that involve text data. In this paper we present datasets in English and Russian for learning and evaluating morphological segmentation algorithms, demonstrate the method based on the sequence to sequence ...
Added: October 9, 2020
Shpilman A., Sosin I., Kudenko D., , in : 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). : IEEE, 2018. P. 1436-1441.
Movement control of artificial limbs has made big advances in recent years. New sensor and control technology enhanced the functionality and usefulness of artificial limbs to the point that complex movements, such as grasping, can be performed to a limited extent. To date, the most successful results were achieved by applying recurrent neural networks (RNNs), ...
Added: January 18, 2019
Sushentsev N., Abrego L., Colarieti A. et al., EUROPEAN UROLOGY OPEN SCIENCE 2023 Vol. 52 P. 36-39
The global uptake of prostate cancer (PCa) active surveillance (AS) is steadily increasing. While prostate-specific antigen density (PSAD) is an important baseline predictor of PCa progression on AS, there is a scarcity of recommendations on its use in follow-up. In particular, the best way of measuring PSAD is unclear. One approach would be to use ...
Added: February 28, 2024