Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

M. Kodryan; A. Grachev; D. I. Ignatov; D. Vetrov

doi:10.18653/v1/W19-4306

Publications

?

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

P. 40–48.

Kodryan M., Grachev A., Ignatov D. I., Vetrov D.

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.

Keywords: language modeling моделирование языка recurrent neural networks рекуррентные нейронные сети Automatic Relevance Determination определение релевантности

Publication based on the results of:

Discovering and Representing Knowledge for Recommender Systems (2019)

In book

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Issue W19-43. , Association for Computational Linguistics, 2019.

Compression of recurrent neural networks for efficient language modeling

Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354–362

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...

Added: June 12, 2019

Параметрическая оптимизация точности морфологической разметки текстов

Klyshinskiy E., Рысаков С. В., Новые информационные технологии в автоматизированных системах 2016

Статья знакомит читателя с базовыми понятиями параметрической оптимизации. Описывается разработанная модель аппроксимация вероятности, функции-счётчики и коэффициенты корреляции. Небольшое внимание уделено методу полного перебора, в результате работы которого достигнуты новые показатели точности. В конце приведена модификация метода снятия омонимии, разработанная авторами. ...

Added: June 14, 2016

Structured Sparsification of Gated Recurrent Neural Networks

Lobacheva E., Chirkova N., Markovich A. et al., , in: Thirty-Fourth AAAI Conference on Artificial IntelligenceVol. 34. AAAI Press, 2020. Ch. 5938 P. 4989–4996.

Added: October 29, 2020

Referential choice: Multiplicity of factors and corpus-based modeling

Kibrik A. A., Dobrov G. B., Khudyakova M. et al., Frontiers of Cognition 2013

Referential choice is the process of selecting an appropriate referential expression for a referent that the speaker/writer intends to mention at some point in discourse. Referential choice is governed by the referent's current status in the speaker's/writer's working memory. This status, in turn, is determined by a number of factors, rooted in discourse context and ...

Added: October 25, 2013

Bayesian Sparsification of Gated Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., , in: Workshop on Compact Deep Neural Network Representation with Industrial Applications, Thirty-second Conference on Neural Information Processing Systems. Montréal: [б.и.], 2018. P. 1–6.

Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. ...

Added: December 5, 2018

TAPE: Assessing Few-shot Russian Language Understanding

Taktasheva E., Shavrina T., Fenogenova A. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 2022. P. 2472–2497.

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six ...

Added: September 22, 2023

Functional models of elementary discursive units in Russian eSports commentary

Микулинский А. Д., , in: Синергия языков и культур 2022: междисциплинарные исследования. St. Petersburg: -, 2023. P. 335–351.

The paper is devoted to the issue of the local structure modeling of the eSports commentary spoken genre on an example of the Dota 2 computer discipline. ESports commentary is a spontaneous and creative speech aimed at describing of what is happening on the computer-gaming field. The main factors that force us to study it ...

Added: May 12, 2024

Прогнозирование энергопотребления на основе автоматического машинного обучения

Danilov K., Автоматизация. Современные технологии 2020 Т. 74 № август 2020 С. 402–407

Рассмотрена задача прогнозирования энергопотребления на основе автоматического машинного обучения. Приведена схема процесса автоматического создания и применения модели прогнозирова ния. Предлагаемый подход апробирован на основе данных о потреблении электроэнергии в регионах России. Проведённый вычислительный эксперимент показал высокую эффективность разработан ной модели. Точность прогнозирования составила 97...99 %. ...

Added: June 13, 2022

On the Embeddings of Variables in Recurrent Neural Networks for Source Code

Chirkova N., , in: 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021). Association for Computational Linguistics, 2021. P. 2679–2689.

Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics of a variable is defined not only by its name but also by the contexts in which ...

Added: August 31, 2021

SEARNN: Training RNNs with global-local losses

Leblond R., Alayrac J., Osokin A. et al., , in: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). [б.и.], 2018. P. 1–16.

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an ...

Added: October 29, 2018

Self-supervised recurrent depth estimation with attention mechanisms

Makarov I., Bakhanova M., Nikolenko S. et al., PeerJ Computer Science 2022 Vol. 8 Article e865

Depth estimation has been an essential task for many computer vision applications, especially in autonomous driving, where safety is paramount. Depth can be estimated not only with traditional supervised learning but also via a self-supervised approach that relies on camera motion and does not require ground truth depth maps. Recently, major improvements have been introduced ...

Added: February 1, 2022

Morphological segmentation with sequence to sequence neural network

Arefyev, N.V., Gratsianova T. Y., Popov K., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2018" Proceedings. M.: Conference Proceedings Editorial board, 2018. P. 85–95.

Morphological segmentation is an important task of natural language processing as it can significantly improve the processing of unfamiliar and rare words in different tasks that involve text data. In this paper we present datasets in English and Russian for learning and evaluating morphological segmentation algorithms, demonstrate the method based on the sequence to sequence ...

Added: October 9, 2020

Continuous Gesture Recognition from sEMG Sensor Data with Recurrent Neural Networks and Adversarial Domain Adaptation

Shpilman A., Sosin I., Kudenko D., , in: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV). IEEE, 2018. P. 1436–1441.

Movement control of artificial limbs has made big advances in recent years. New sensor and control technology enhanced the functionality and usefulness of artificial limbs to the point that complex movements, such as grasping, can be performed to a limited extent. To date, the most successful results were achieved by applying recurrent neural networks (RNNs), ...

Added: January 18, 2019

The Prosody of a Poet's Prose: Comparative Analysis of the Rhythmic Structure of A. Pushkin Prose

Наконечная Е. Т., , in: Proceedings of the 22nd Conference of Open Innovations Association FRUCT. Jyvaskyla: [б.и.], 2018. P. 361–365.

Статья является продолжением ряда исследований, посвященных изучению ритмики художественной прозы А. С. Пушкина. В работе рассматриваются такие произведения, как «Дубровский», «Пиковая дама», «Капитанская дочка», «Кирджали», «Египетские ночи». Применяется метод отбора «случайных» четырехстопных ямбов. Ритмика стихоподобных фрагментов сравнивается с вероятностно-статистическими моделями распределения стихотворных строк в прозе. В результате анализа случайных стихоподобных фрагментов рассмотрена эволюция ритмики прозы ...

Added: June 12, 2018

Ad Astra or Astray: Exploring Linguistic Knowledge of Multilingual BERT through NLI Task

Tikhonova M., Mikhailov V., Dina Pisarevskaya et al., Natural Language Engineering 2022 P. 1–30

Recent research has reported that standard fine-tuning approaches can be unstable due to being prone to various sources of randomness, including but not limited to weight initialization, training data order, and hardware. Such brittleness can lead to different evaluation results, prediction confidences, and generalization inconsistency of the same models independently fine-tuned under the same experimental setup. ...

Added: May 21, 2022

Computational Linguistics and Intellectual Technologies

M.: Russian State University for the Humanitie, 2019.

The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...

Added: June 12, 2019

Bayesian Sparsification of Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., , in: 1st Workshop on Learning to Generate Natural Language, International Conference on Machine Learning. [б.и.], 2017. P. 1–8.

Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural ...

Added: October 30, 2018

Neural Networks Compression for Language Modeling

Grachev A., Ignatov D. I., Savchenko A., , in: Pattern Recognition and Machine Intelligence. 7th International Conference, PReMI 2017, Kolkata, India, December 5-8, 2017, Proceedings. Lecture Notes in Computer Science book series (LNCS, volume 10597). Springer, 2017. P. 351–357.

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with ...

Added: October 14, 2018

Модели и методы интерактивного взаимодействия с вычислительными устройствами нового поколения

Manakhov P., Ковшов Е. Е., Прикладная информатика 2012 № 3(39) С. 71–81

The article examines the issue of developing models of the text input methods. The urgency of this matter is dictated by the reduction of financial costs of designing new input methods and upgrading existing ones. The article suggests a modeling method, which is verified by a series of experiments. Also the article gives recommendations on ...

Added: January 17, 2015

Using a Recurrent Neural Network To Inform the Use of Prostate- specific Antigen (PSA) and PSA Density for Dynamic Monitoring of the Risk of Prostate Cancer Progression on Active Surveillance

Sushentsev N., Abrego L., Colarieti A. et al., EUROPEAN UROLOGY OPEN SCIENCE 2023 Vol. 52 P. 36–39

The global uptake of prostate cancer (PCa) active surveillance (AS) is steadily increasing. While prostate-specific antigen density (PSAD) is an important baseline predictor of PCa progression on AS, there is a scarcity of recommendations on its use in follow-up. In particular, the best way of measuring PSAD is unclear. One approach would be to use ...

Added: February 28, 2024

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

Bayesian Group Sparsification of Long Short-Term Memory Networks

Lobacheva E., Chirkova N., Vetrov D., /. 2018.

We propose a new Bayesian sparsification technique for gated recurrent architectures that encounters for its recurrent specifics and gated mechanism. Our method eliminates neurons from the model and makes gates constant, not only compressing the network, but also significantly accelerating a forward pass. On the discriminative tasks our method compresses LSTM extremely, so that only ...

Added: October 16, 2018

Conditional Generators of Words Definitions

Gadetsky A., Yakubovskiy I., Vetrov D., , in: Proceedings of the 56th Annual Meeting of the Association for Computational LinguisticsVol. 2: Short Papers. Association for Computational Linguistics, 2018. P. 266–271.

We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative ...

Added: February 27, 2019