Voice command recognition in intelligent systems using deep neural networks

A. Sokolov; A. Savchenko

doi:10.1109/SAMI.2019.8782755

Publications

?

Voice command recognition in intelligent systems using deep neural networks

Ch. 19. P. 113–116.

Sokolov A., Savchenko A.

In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending of the grammar in order to filter unknown commands. As a result, the grammar is resistant to distortions and unexpected words near or inside of command. We implemented the proposed approach using Finite State Transducers in the Kaldi framework and examined it using self-recorded noised data with various level of signal-to-noise ratio. We compared recognition accuracy and average decision-making time of our approach with the state-of-the-art continuous speech recognition engines based on language models. It was experimentally shown that our approach is characterized by up to 60% higher accuracy than conventional offline speech recognition methods based on language models. The speed of utterance recognition is 3 times higher than speed of traditional continuous speech recognition algorithms.

Keywords: автоматическое распознавание речи системы голосового управления voice control system deep neural networks глубокие нейронные сети automatic speech recognition

Publication based on the results of:

Эффективные методы распознавания мультимедийных данных для задач анализа предпочтений пользователей мобильных устройств (2019)

In book

17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)

IEEE, 2019.

Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков

Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127

This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...

Added: December 12, 2025

Bridging Gaps in Russian Language Processing: AI and Everyday Conversations

Tatiana Sherstinova, Nikolay Mikhaylovskiy, Evgenia Kolpashchikova et al., , in: Proceedings of the 35th Conference of Open Innovations Association FRUCT, 24-26 April 2024, Tampere, FinlandIssue 1.: FRUCT Oy, 2024. P. 253–258.

Contemporary advancements in NLP and neural network techniques are paving the way to enhance and harness traditional linguistic resources and corpora, as well as expand the methods of applying neural networks for complex language material. Thus, a weak point for both theoretical and applied linguistic tasks is the processing of spontaneous everyday speech. Two experiments ...

Added: November 29, 2024

Распознавание речи в корпусе аудиозаписей торговых представителей: проблемы, решения и исследовательские перспективы

Колмогорова П. А., В кн.: Лингвистическая семантика в пространственном измерении: Словарь. Дискурс. Корпус.: Екатеринбург: Кабинетный ученый, 2024. Гл. 9.2 С. 411–422.

Added: November 29, 2024

The Appliance of Deep Neural Networks in the Process of Managing Chemical Enterprises

Kulyasova E. V., Kulyasov N.S., Puchkov A. Y., , in: Journal of Physics: Conference Series Volume 1260, 2019 Mechanical Science and Technology Update 23–24 April 2019, Omsk, Russian Federation.: IOP Publishing, 2019. Ch. 3 P. 032024–032024.

This article is introduced into the perspective tendencies of the digital transformation of chemical enterprises which allow to improve the process of managing enterprises of the branch. Presented the algorithms of managing and technological information processing based on deep neural network apparatus. New approaches to data processing known as video analytics are applied; it allows ...

Added: September 27, 2024

Latent Stochastic Differential Equations for Change Point Detection

Ryzhikov A., Hushchyn M., Derkach D., IEEE Access 2023 Vol. 11 P. 104700–104711

Automated analysis of complex systems based on multiple readouts remains a challenge. Change point detection algorithms are aimed to locating abrupt changes in the time series behaviour of a process. In this paper, we present a novel change point detection algorithm based on Latent Neural Stochastic Differential Equations (SDE). Our method learns a non-linear deep ...

Added: October 5, 2023

Data-Driven Short-Term Daily Operational Sea Ice Regional Forecasting

Grigoryev T., Verezemskaya P., Krinitskiy M. et al., Remote Sensing 2022 Vol. 14 No. 22 Article 5837

Global warming has made the Arctic increasingly available for marine operations and created a demand for reliable operational sea ice forecasts to increase safety. Because ocean-ice numerical models are highly computationally intensive, relatively lightweight ML-based methods may be more efficient for sea ice forecasting. Many studies have exploited different deep learning models alongside classical approaches ...

Added: June 19, 2023

Loss function dynamics and landscape for deep neural networks trained with quadratic loss

Nakhodnov M., Kodryan M., Lobacheva E. et al., , in: Doklady MathematicsVol. 106. Issue 1: Supplement.: Pleiades Publishing, Ltd. (Плеадес Паблишинг, Лтд), 2023. P. 43–62.

Knowledge of the loss landscape geometry makes it possible to successfully explain the behavior of neural networks, the dynamics of their training, and the relationship between resulting solutions and hyperparameters, such as the regularization method, neural network architecture, or learning rate schedule. In this paper, the dynamics of learning and the surface of the standard ...

Added: June 9, 2023

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Shuranov E., / Series Computer Science "arxiv.org". 2021.

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: February 14, 2023

Использование сверточных нейронных сетей для реидентификации людей в городских условиях

Сучков Е. П., Алексеенко Г. О., Налчаджи К. В., Интеллектуальные системы. Теория и приложения 2022 Т. 26 № 1 С. 250–254

Currently, video surveillance systems are becoming more widespread. One of the main goals of such systems is to control and track a person’s movement. The solution of this problem allows us to solve such applied problems as tracking the occupancy of various premises (whether shopping facilities or educational and cultural institutions), creating a motion heatmap or organizing control of access to ...

Added: January 31, 2023

Использование сверточных нейронных сетей для реидентификации людей в городских условиях

Алексеенко Г., Налчаджи К., Интеллектуальные системы. Теория и приложения 2022 Т. 26 № 1 С. 250–254

В настоящее время все более широкое распространение получают различные системы видеофиксации. Одной из основных целей таких систем является контроль и слежение за человеком. Решение данной задачи позволяет в дальнейшем решать такие прикладные задачи, как контроль заполненности различных помещений (будь-то торговые объекты или образовательно-культрурные учереждения), построение тепловой карты перемещений человека, организация контроля доступа к тому или ...

Added: December 21, 2022

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Kodryan M., Lobacheva E., Nakhodnov M. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 14058–14070.

A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR ...

Added: December 20, 2022

Recognition of the Bare Soil Using Deep Machine Learning Methods to Create Maps of Arable Soil Degradation Based on the Analysis of Multi-Temporal Remote Sensing Data

Rukhovich D., Koroleva P., Rukhovich D. et al., Remote Sensing 2022 Vol. 14 No. 9 Article 2224

The detection of degraded soil distribution areas is an urgent task. It is difficult and very time consuming to solve this problem using ground methods. The modeling of degradation processes based on digital elevation models makes it possible to construct maps of potential degradation, which may differ from the actual spatial distribution of degradation. The ...

Added: November 14, 2022

Comment on “Pushing the frontiers of density functionals by solving the fractional electron problem”

Gerasimov I., Losev T., Evgeny Yu. Epifanov et al., Science 2022 Vol. 377 No. 6606 Article eabq3385

Kirkpatrick et al. (Reports, 9 December 2021, p. 1385) trained a neural network–based DFT functional, DM21, on fractional-charge (FC) and fractional-spin (FS) systems, and they claim that it has outstanding accuracy for chemical systems exhibiting strong correlation. Here, we show that the ability of DM21 to generalize the behavior of such systems does not follow ...

Added: September 25, 2022

Deep learning for inferring distribution of time to the last common ancestor from a diploid genome

K. Arzymatov, E. Khomutov, V. Shchur, Lobachevskii Journal of Mathematics 2022 Vol. 43 No. 8 P. 2092–2098

Genomic data is a rich source of information about population history. In particular, for actively recombining species the time to the last common ancestor (LCA) between two chromosomes might be different in different chromosome loci. Estimating local LCA time is important for many problems: it can be used to infer genes under selection, or to ...

Added: September 19, 2022

Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Belomestny D., Naumov A., Puchkin N. et al., Neural Networks 2023 Vol. 161 P. 242–253

This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded ...

Added: July 13, 2022

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Lobacheva E., Kodryan M., Chirkova N. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 21545–21556.

Added: December 29, 2021