Neural Networks Compression for Language Modeling

A. Grachev; D. I. Ignatov; A. Savchenko

doi:10.1007/978-3-319-69900-4_44

Publications

?

Neural Networks Compression for Language Modeling

P. 351–357.

Grachev A., Ignatov D. I., Savchenko A.

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g., LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial
for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

Keywords: quantization language modeling Pruning LSTM RNN Low-rank factorization

In book

Pattern Recognition and Machine Intelligence. 7th International Conference, PReMI 2017, Kolkata, India, December 5-8, 2017, Proceedings. Lecture Notes in Computer Science book series (LNCS, volume 10597)

Springer, 2017.

Детектирование эмоций в речи с использованием долгой краткосрочной памяти

Попова А. С., Рассадин А. Г., Пономаренко А. А., В кн.: Материалы XXIV международной научно-технической конференции «Информационные системы и технологии-2018.: [б.и.], 2018. С. 1083–1089.

Рассматривается задача автоматической классификации эмоций в цифровом аудио сигнале. В работе рассматривается и верифицируется подход, в котором классификация звукового фрагмента производится с помощью рекуррентной нейронной сети c долговременно-кратковременной памятью. В качестве признаков использовались мел-кепстральные коэффициенты. Произведен численный эксперимент на открытом наборе данных Ravdess, включающий 8 различных эмоций: “нейтральный”, “спокойный”, “счастливый”, “грустный”, “злой”, “испуганный”, “отвращение”, “удивление” ...

Added: October 21, 2018

An h-dependent formulation of the Kadomtsev-Petviashvili hierarchy

Takasaki K., Takebe T., Теоретическая и математическая физика (Российская Федерация) 2012 Vol. 171 No. 2 P. 683–690

We briefly review a recursive construction of hbar-dependent solutions of the Kadomtsev-Petviashvili hierarchy. We give recurrence relations for the coefficients X_n of an ħ-expansion of the operator X = X_0 + hbar X_1 + hbar^2 X_2 + ... for which the dressing operator W is expressed in the exponential form W = exp(X/hbar). The wave ...

Added: June 22, 2012

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Neklyudov K. O., Molchanov D., Ashukha A. et al., , in: Advances in Neural Information Processing Systems 30 (NIPS 2017).: Montreal: Curran Associates, 2017. P. 6776–6785.

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can ...

Added: January 29, 2018

Application of the Method of Multivariate Multi-stage Forecasting Based on the LSTM Deep Learning Model for Bitcoin Price Time Series

Natalia Sizykh, Said Dandamaev, Dmitry Sizykh, , in: 16th International Conference Management of large-scale system development (MLSD).: IEEE, 2023. P. 1–5.

Forecasting data and research on cryptocurrency price forecasting methods are increasing in importance. So far, methods based on LSTM deep learning architecture have shown the best results in forecasting cryptocurrency prices. In order to improve the accuracy of forecasting data, this paper investigates the application of a multivariate multistep forecasting method based on the LSTM ...

Added: December 22, 2023

Исследование методов машинного обучения для классификации научных текстов на русском языке

Кусакин И. К., Федорец О. В., Romanov A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2022 Т. 12 С. 6–9

This paper discusses modern approaches to natural language processing and appliance of artificial intelligence technologies in the task of classifying scientific texts in Russian. The report contains an analysis of implementations of text vectorization methods, a description of experiments with training various classifier models: from classical machine learning algorithms to neural network transformer architectures. ...

Added: January 31, 2023

Computational Linguistics and Intellectual Technologies

M.: Russian State University for the Humanitie, 2019.

The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...

Added: June 12, 2019

Compression of recurrent neural networks for efficient language modeling

Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354–362

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...

Added: June 12, 2019

Normal Forms, Inner Products, and Maslov Indices of General Multimode Squeezings

Chebotarev A., Tlyachev T. V., Mathematical notes 2014 Vol. 95 No. 5 P. 721–737

In this paper, we present a purely algebraic construction of the normal factorization of multimode squeezed states and calculate their inner products. This procedure allows one to orthonormalize bases generated by squeezed states. We calculate several correct representations of the normalizing constant for the normal factorization, discuss an analog of the Maslov index for squeezed ...

Added: June 4, 2014

Quantization of Drinfeld Zastavain type A

Finkelberg M. V., Rybnikov L. G., Journal of the European Mathematical Society 2012

algebra $\hat{sl}_n$. We introduce an affine, reduced, irreducible, normal quiver variety $Z$ which maps to the Zastava space bijectively at the level of complex points. The natural Poisson structure on the Zastava space can be described on $Z$ in terms of Hamiltonian reduction of a certain Poisson subvariety of the dual space of a (nonsemisimple) ...

Added: February 19, 2013

TAPE: Assessing Few-shot Russian Language Understanding

Taktasheva E., Shavrina T., Fenogenova A. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2022.: Association for Computational Linguistics, 2022. P. 2472–2497.

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six ...

Added: September 22, 2023

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Proceedings of the Nineteenth International Conference on Geometry, Integrability and Quantization

Sofia: Avangard Prima, 2018.

Added: January 31, 2018

Are CDS spreads predictable during the Covid-19 pandemic? Forecasting based on SVM, GMDH, LSTM and Markov switching autoregression

Vukovic D., Romanyuk K., Ivashchenko S. et al., Expert Systems with Applications 2022 Vol. 194 No. May 2022 Article 116553

This paper investigates the forecasting performance for credit default swap (CDS) spreads by Support Vector Machines (SVM), Group Method of Data Handling (GMDH), Long Short-Term Memory (LSTM) and Markov switching autoregression (MSA) for daily CDS spreads of the 513 leading US companies, in the period 2009–2020. The goal of this study is to test the forecasting performance of ...

Added: February 4, 2022

Модели и методы интерактивного взаимодействия с вычислительными устройствами нового поколения

Manakhov P., Ковшов Е. Е., Прикладная информатика 2012 № 3(39) С. 71–81

The article examines the issue of developing models of the text input methods. The urgency of this matter is dictated by the reduction of financial costs of designing new input methods and upgrading existing ones. The article suggests a modeling method, which is verified by a series of experiments. Also the article gives recommendations on ...

Added: January 17, 2015

Referential choice: Multiplicity of factors and corpus-based modeling

Kibrik A. A., Dobrov G. B., Khudyakova M. et al., Frontiers of Cognition 2013

Referential choice is the process of selecting an appropriate referential expression for a referent that the speaker/writer intends to mention at some point in discourse. Referential choice is governed by the referent's current status in the speaker's/writer's working memory. This status, in turn, is determined by a number of factors, rooted in discourse context and ...

Added: October 25, 2013

Extensions of vertex algebras. Constructions and applications

Feigin B. L., Russian Mathematical Surveys 2017 Vol. 72 No. 4 P. 707–763

This paper discusses the main known constructions of vertex operator algebras. The starting point is the lattice algebra. Screenings distinguish subalgebras of lattice algebras. Moreover, one can construct extensions of vertex algebras. Combining these constructions gives most of the known examples. A large class of algebras with big centres is constructed. Such algebras have applications ...

Added: November 5, 2020

Lectures on universal Teichmüller space

Sergeev A., European Mathematical Society Publishing house, 2014.

This book is based on a lecture course given by the author at the Educational Center of the Steklov Mathematical Institute in 2011. It is designed for a one-semester course for undergraduate students familiar with basic differential geometry and complex and functional analysis. The universal Teichmüller space T is the quotient of the space of quasisymmetric ...

Added: April 9, 2015

Leveraging Emotional Signals for Credibility Detection

Giachanou A., Россо П., Crestani F., , in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’19).: NY: Association for Computing Machinery (ACM), 2019. P. 877–880.

The spread of false information on the Web is one of the main problems of our society. Automatic detection of fake news posts is a hard task since they are intentionally written to mislead the readers and to trigger intense emotions to them in an attempt to be disseminated in the social networks. Even though ...

Added: October 29, 2020

Analysis of neural networks efficiency for determining positions of corrupted bytes

Slastnikov S., Лупанов В. Э., Journal of Physics: Conference Series 2019 Vol. 1163 No. 12048 P. 1–6

A lot of files and data, in general, are transferred throughout the networks. But the data may be corrupted by intrusions or package loss so, the executable files may be marked as non-executable and violate the local network policy. Thus, it’s necessary to detect such files. In this paper, we present a novel method for ...

Added: October 19, 2018

Differential geometry and quantization on a locally compact group

Akbarov S. S., Izvestiya. Mathematics 1995 Vol. 59 No. 2 P. 47–62

Added: September 23, 2016

Towards trigonometric deformation of 𝔰𝔩ˆ2 coset VOA

Feigin B. L., Jimbo M., Mukhin E., Journal of Mathematical Physics 2019 Vol. 60 No. 7 P. 073507-1–073507-16

We discuss the quantization of the ̂ sl 2 coset vertex operator algebra W D(2,1;α) using the bosonization technique. We show that after quantization, there exist three families of commuting integrals of motion coming from three copies of the quantum toroidal algebra associated with gl 2 . ...

Added: December 10, 2019

Comparison of different coding schemes for 1-bit ADC

Osipov D., / Series arXiv "math". 2022. No. 1.

This paper devotes to comparison of different cod- ing schemes (various constructions of Polar and LDPC codes, Product codes and BCH codes) for the case when information is transmitted over AWGN channel with quantization with lowest possible complexity and resolution: 1-bit. We examine performance (in terms of Frame-error-rate — FER) for schemes mentioned above and ...

Added: December 27, 2022

Уравнение типа Хартри с потенциалом взаимодействия Юкавы в квазиклассическом приближении

Pereskokov A., Липская А. В., Вестник Московского энергетического института 2010 № 6 С. 99–109

Рассмотрены радиально-симметричные решения уравнения типа Хартри, содержащего как кулоновский потенциал, так и интегральную нелинейность с потенциалом взаимодействия Юкавы. В квазиклассическом приближении выведены и исследованы уравнения для самосогласованного потенциала. Выписано правило квантования типа Бора-Зоммерфельда. Найдены асимптотические собственные значения и собственные функции. ...

Added: December 16, 2012