Neural Networks Compression for Language Modeling

A. Grachev; D. I. Ignatov; A. Savchenko

doi:10.1007/978-3-319-69900-4_44

Publications

?

Neural Networks Compression for Language Modeling

P. 351–357.

Grachev A., Ignatov D. I., Savchenko A.

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial
for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

Keywords: quantization language modeling Pruning LSTM RNN Low-rank factorization

In book

Pattern Recognition and Machine Intelligence. 7th International Conference, PReMI 2017, Kolkata, India, December 5-8, 2017, Proceedings. Lecture Notes in Computer Science book series (LNCS, volume 10597)

Springer, 2017.

Детектирование эмоций в речи с использованием долгой краткосрочной памяти

Попова А. С., Рассадин А. Г., Пономаренко А. А., В кн.: Материалы XXIV международной научно-технической конференции «Информационные системы и технологии-2018. [б.и.], 2018. С. 1083–1089.

Рассматривается задача автоматической классификации эмоций в цифровом аудио сигнале. В работе рассматривается и верифицируется подход, в котором классификация звукового фрагмента производится с помощью рекуррентной нейронной сети c долговременно-кратковременной памятью. В качестве признаков использовались мел-кепстральные коэффициенты. Произведен численный эксперимент на открытом наборе данных Ravdess, включающий 8 различных эмоций: “нейтральный”, “спокойный”, “счастливый”, “грустный”, “злой”, “испуганный”, “отвращение”, “удивление” ...

Added: October 21, 2018

On categories O for quantized symplectic resolutions

Losev Ivan, Compositio Mathematica 2017 Vol. 153 No. 12 P. 2445–2481

In this paper we study categories O over quantizations of symplectic resolutions admitting Hamiltonian tori actions with finitely many fixed points. In this generality, these categories were introduced by Braden, Licata, Proudfoot and Webster. We establish a family of standardly stratified structures (in the sense of the author and Webster) on these categories O. We ...

Added: October 15, 2017

Application of the Method of Multivariate Multi-stage Forecasting Based on the LSTM Deep Learning Model for Bitcoin Price Time Series

Natalia Sizykh, Said Dandamaev, Dmitry Sizykh, , in: 16th International Conference Management of large-scale system development (MLSD). IEEE, 2023. P. 1–5.

Forecasting data and research on cryptocurrency price forecasting methods are increasing in importance. So far, methods based on LSTM deep learning architecture have shown the best results in forecasting cryptocurrency prices. In order to improve the accuracy of forecasting data, this paper investigates the application of a multivariate multistep forecasting method based on the LSTM ...

Added: December 22, 2023

Уравнение типа Хартри с потенциалом взаимодействия Юкавы в квазиклассическом приближении

Pereskokov A., Липская А. В., Вестник Московского энергетического института 2010 № 6 С. 99–109

Рассмотрены радиально-симметричные решения уравнения типа Хартри, содержащего как кулоновский потенциал, так и интегральную нелинейность с потенциалом взаимодействия Юкавы. В квазиклассическом приближении выведены и исследованы уравнения для самосогласованного потенциала. Выписано правило квантования типа Бора-Зоммерфельда. Найдены асимптотические собственные значения и собственные функции. ...

Added: December 16, 2012

Wreath Macdonald polynomials and categorical McKay correspondence

Vologodsky V., Finkelberg M. V., Bezrukavnikov R., Cambridge Journal of Mathematics 2014 Vol. 2 No. 2 P. 163–190

Marc Haiman has reduced Macdonald Positivity Conjecture to a statement about geometry of the Hilbert scheme of points on the plane, and formulated a generalization of the conjecture where the symmetric group is replaced by the wreath product of S_n and Z/rZ. He has proven the original conjecture by establishing the geometric statement about the ...

Added: December 17, 2015

Geometric Methods in Physics XXXVIII. Workshop, Białowieża, Poland, 2019

Cham: Birkhäuser, 2020.

The book consists of articles based on the XXXVIII Białowieża Workshop on Geometric Methods in Physics, 2019. The series of Białowieża workshops, attended by a community of experts at the crossroads of mathematics and physics, is a major annual event in the field. The works in this book, based on presentations given at the workshop, ...

Added: November 3, 2021

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Neklyudov K. O., Molchanov D., Ashukha A. et al., , in: Advances in Neural Information Processing Systems 30 (NIPS 2017). Montreal: Curran Associates, 2017. P. 6776–6785.

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can ...

Added: January 29, 2018

Proceedings of the Nineteenth International Conference on Geometry, Integrability and Quantization

Sofia: Avangard Prima, 2018.

Added: January 31, 2018

Модели и методы интерактивного взаимодействия с вычислительными устройствами нового поколения

Manakhov P., Ковшов Е. Е., Прикладная информатика 2012 № 3(39) С. 71–81

The article examines the issue of developing models of the text input methods. The urgency of this matter is dictated by the reduction of financial costs of designing new input methods and upgrading existing ones. The article suggests a modeling method, which is verified by a series of experiments. Also the article gives recommendations on ...

Added: January 17, 2015

Compression of recurrent neural networks for efficient language modeling

Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354–362

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...

Added: June 12, 2019

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

Linearly Converging Error Compensated SGD

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

Параметрическая оптимизация точности морфологической разметки текстов

Klyshinskiy E., Рысаков С. В., Новые информационные технологии в автоматизированных системах 2016

Статья знакомит читателя с базовыми понятиями параметрической оптимизации. Описывается разработанная модель аппроксимация вероятности, функции-счётчики и коэффициенты корреляции. Небольшое внимание уделено методу полного перебора, в результате работы которого достигнуты новые показатели точности. В конце приведена модификация метода снятия омонимии, разработанная авторами. ...

Added: June 14, 2016

A notion of stability for k-means clustering

Le Gouic T., Paris Q., Electronic journal of statistics 2018 Vol. 12 No. 2 P. 4239–4263

In this paper, we define and study a new notion of stability for the k-means clustering scheme building upon the field of quantization of a probability measure. We connect this definition of stability to a geometric feature of the underlying distribution of the data, named absolute margin condition, inspired by recent works on the subject. ...

Added: November 9, 2018

Исследование методов машинного обучения для классификации научных текстов на русском языке

Кусакин И. К., Федорец О. В., Romanov A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2022 Т. 12 С. 6–9

This paper discusses modern approaches to natural language processing and appliance of artificial intelligence technologies in the task of classifying scientific texts in Russian. The report contains an analysis of implementations of text vectorization methods, a description of experiments with training various classifier models: from classical machine learning algorithms to neural network transformer architectures. ...

Added: January 31, 2023

Lectures on universal Teichmüller space

Sergeev A., European Mathematical Society Publishing house, 2014.

This book is based on a lecture course given by the author at the Educational Center of the Steklov Mathematical Institute in 2011. It is designed for a one-semester course for undergraduate students familiar with basic differential geometry and complex and functional analysis. The universal Teichmüller space T is the quotient of the space of quasisymmetric ...

Added: April 9, 2015

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Are CDS spreads predictable during the Covid-19 pandemic? Forecasting based on SVM, GMDH, LSTM and Markov switching autoregression

Vukovic D., Romanyuk K., Ivashchenko S. et al., Expert Systems with Applications 2022 Vol. 194 No. May 2022 Article 116553

This paper investigates the forecasting performance for credit default swap (CDS) spreads by Support Vector Machines (SVM), Group Method of Data Handling (GMDH), Long Short-Term Memory (LSTM) and Markov switching autoregression (MSA) for daily CDS spreads of the 513 leading US companies, in the period 2009–2020. The goal of this study is to test the forecasting performance of ...

Added: February 4, 2022

Extensions of vertex algebras. Constructions and applications

Feigin B. L., Russian Mathematical Surveys 2017 Vol. 72 No. 4 P. 707–763

This paper discusses the main known constructions of vertex operator algebras. The starting point is the lattice algebra. Screenings distinguish subalgebras of lattice algebras. Moreover, one can construct extensions of vertex algebras. Combining these constructions gives most of the known examples. A large class of algebras with big centres is constructed. Such algebras have applications ...

Added: November 5, 2020

Computational Linguistics and Intellectual Technologies

M.: Russian State University for the Humanitie, 2019.

The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...

Added: June 12, 2019

An h-dependent formulation of the Kadomtsev-Petviashvili hierarchy

Takasaki K., Takebe T., Теоретическая и математическая физика (Российская Федерация) 2012 Vol. 171 No. 2 P. 683–690

We briefly review a recursive construction of hbar-dependent solutions of the Kadomtsev-Petviashvili hierarchy. We give recurrence relations for the coefficients X_n of an ħ-expansion of the operator X = X_0 + hbar X_1 + hbar^2 X_2 + ... for which the dressing operator W is expressed in the exponential form W = exp(X/hbar). The wave ...

Added: June 22, 2012

Ad Astra or Astray: Exploring Linguistic Knowledge of Multilingual BERT through NLI Task

Tikhonova M., Mikhailov V., Dina Pisarevskaya et al., Natural Language Engineering 2022 P. 1–30

Recent research has reported that standard fine-tuning approaches can be unstable due to being prone to various sources of randomness, including but not limited to weight initialization, training data order, and hardware. Such brittleness can lead to different evaluation results, prediction confidences, and generalization inconsistency of the same models independently fine-tuned under the same experimental setup. ...

Added: May 21, 2022

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

Kodryan M., Grachev A., Ignatov D. I. et al., , in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)Issue W19-43. Association for Computational Linguistics, 2019. P. 40–48.

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in ...

Added: November 1, 2019