Deep Learning Approaches for Understanding Simple Speech Commands

Solovyev R. A.; Vakhrushev M.; Radionov A.; Romanova I.I.; Amerikanov A.A.; Aliev V.; Shvets A. A.

doi:10.1109/ELNANO50318.2020.9088863

Publications

?

Deep Learning Approaches for Understanding Simple Speech Commands

Ch. 9088863. P. 688–693.

Solovyev R. A., Vakhrushev M., Radionov A., Romanova I.I., Amerikanov A.A., Aliev V., Shvets A. A.

Automatic classification of sound commands is becoming increasingly important, especially for embedded and mobile devices. Many of these devices contain both microphones and cameras. The manufacturers that develop and produce them would like to use the same methodology for sound and image classification tasks. It’s possible to achieve by representing sound commands as images, and then use convolutional neural networks when classifying images as well as sounds. In this research, we tried several approaches to the problem of sound classification that we applied in TensorFlow Speech Recognition Challenge organized by Google Brain team on the Kaggle platform. Here we show different representations of sounds (Wave frames, Spectrograms, Mel-Spectrograms, MFCCs) and apply several 1D and 2D convolutional neural networks to get the best performance. As a novelty of our work, we developed and trained from scratch two 1d network architectures that are topologically similar to 2d VGG and ResNet network types. These networks show similar performance with 2d networks when sound signal is represented by using melgrams. Our experiments reveal that we found appropriate sound representation and corresponding convolutional neural networks. As a result, we achieved good classification accuracy (91.8%) that allowed us to finish the challenge on 8-th place among 1315 teams.

Keywords: deep learning convolutional neural networks speech classification

In book

2020 IEEE 40th International Conference on Electronics and Nanotechnology (ELNANO)

IEEE, 2020.

Touching the Limits of a Dataset in Video-Based Facial Expression Recognition

Churaev E., Savchenko A., , in: 2021 International Russian Automation Conference (RusAutoCon). IEEE, 2021. P. 633–638.

In this paper, we examine the issue of video-based facial emotion recognition algorithms which show excellent performance on some benchmarks, but have much worse accuracy in practical applications. For example, the typical error rate of contemporary deep neural networks on the RAVDESS dataset is less than 5%. We argue that such results are obtained only ...

Added: October 7, 2021

Emotion Recognition in Sound

Popova A. S., Alexandr G. Rassadin, Alexander A. Ponomarenko, , in: Advances in Neural Computation, Machine Learning, and Cognitive Research. Selected Papers from the XIX International Conference on Neuroinformatics, October 2-6, 2017, Moscow, RussiaVol. 736. Cham: Springer, 2017. P. 117–124.

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. ...

Added: October 18, 2017

Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition

Savchenko A., Optical Memory and Neural Networks (Information Optics) 2017 Vol. 26 No. 2 P. 129–136

We analyzed the way to increase computational efficiency of video-based image recognition methods with matching of high dimensional feature vectors extracted by deep convolutional neural networks. We proposed an algorithm for approximate nearest neighbor search. At the first step, for a given video frame the algorithm verifies a reference image obtained when recognizing the previous ...

Added: June 30, 2017

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Neural Codes for Image Retrieval

Babenko A., Slesarev A., Chigorin A. et al., , in: Lecture Notes in Computer Science. Proceedings of the 13th European Conference on Computer Vision (ECCV 2014)* 1. Vol. 8689. Zürich: Springer, 2014. P. 584–599.

It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard ...

Added: October 1, 2014

Детектирование эмоций в мультимедиа контенте

А. С. Попова, А. Г. Рассадин, А. А. Пономаренко, В кн.: Материалы XXIII международной научно-технической конференции «Информационные системы и технологии-2017». [б.и.], 2017. С. 852–857.

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational ...

Added: October 18, 2017

Data organization in video surveillance systems using deep learning

A.D. Sokolova, A.V. Savchenko, , in: CEUR Workshop ProceedingsVol. 2210: Proceedings of the International Conference Information Technology and Nanotechnology. Session Image Processing and Earth Remote Sensing . [б.и.], 2018. P. 243–250.

In this paper we propose to organize information in video surveillance systems by grouping the video tracks, which contain identical faces. Aggregation of the features of individual frames extracted using deep convolutional neural networks are used in order to obtain a descriptor of video track. The tracks with identical faces are grouped using the known ...

Added: November 5, 2018

Deep convolutional neural networks capabilities for binary classification of polar mesocyclones in satellite mosaics

Криницкий М. А., Verezemskaya P., Гращенков К. В. et al., Atmosphere 2018 Vol. 9 No. 426 P. 1–23

Polar mesocyclones (MCs) are small marine atmospheric vortices. The class of intense MCs, called polar lows, are accompanied by extremely strong surface winds and heat fluxes and thus largely influencing deep ocean water formation in the polar regions. Accurate detection of polar mesocyclones in high-resolution satellite data, while challenging, is a time-consuming task, when performed ...

Added: November 26, 2020

Russian Q&A Method Study: From Naive Bayes to Convolutional Neural Networks

Nikolaev K., Malafeev A., , in: Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018. Springer, 2018. Ch. 12 P. 121–126.

This paper deals with automatic classification of questions in the Russian language. In contrast to previously used methods, we introduce a convolutional neural network for question classification. We took advantage of an existing corpus of 2008 questions, manually annotated in accordance with a pragmatic 14-class typology. We modified the data by reducing the typology to ...

Added: February 15, 2019

Сверточные нейронные сети в задаче распознавания пола и возраста по видеоизображению

Kharchevnikova A., Savchenko A., В кн.: Сборник трудов IV Международной конференции и молодёжной школы "Информационные технологии и нанотехнологии" (ИТНТ 2018). Самара: Предприятие "Новая техника", 2018. Гл. 124 С. 916–924.

In this paper we examine the age and gender video-based recognition problem using deep convolutional neural networks. The comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames is presented. In order to improve the age and gender identification accuracy we implement the video-based recognition system with several aggregation methods. We provide the ...

Added: October 18, 2018

Compressing deep convolutional neural networks in visual emotion recognition

A. G. Rassadin, A. V. Savchenko, , in: CEUR Workshop ProceedingsVol. 1901: Proceedings of the International conference Information Technology and Nanotechnology. Session Image Processing, Geoinformation Technology and Information Security. CEUR-WS, 2017. P. 207–213.

In this paper, we consider the problem of insufficient runtime and memory space complexities of deep convolutional neural networks for visual emotion recognition. A survey of recent compression methods and efficient neural networks architectures is provided. We experimentally compare the computational speed and memory consumption during the training and the inference stages of such methods ...

Added: October 17, 2017

Context-Aware CNNs for Person Head Detection

Vu T., Osokin A., Laptev I., , in: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015). Santiago de Chile: IEEE, 2015. P. 2893–2901.

Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent R-CNN object detector, ...

Added: October 19, 2017

Распознавание изолированных слов на основе взвешенного голосования дикторозависимых нейросетевых моделей

Savchenko L., Информационные технологии 2020 Т. 26 № 5 С. 290–296

article deals with the problem of isolated words recognition based on deep convolutional neural networks. The use of existing recognition systems in practice is limited by an insufficiently high degree of their reliability functioning in conditions of intense acoustic noise, such as street noise, sounds from passing vehicles, etc. Nowadays, the most accurate recognition methods are characterized by ...

Added: September 2, 2020

Система постановки произношения на основе сверточных нейронных сетей и информационной теории восприятия речи

Savchenko L., Информационные технологии 2019 Т. 25 № 5 С. 313–318

We consider a problem of computer assisted language and pronunciation learning based on the deep learning methods and the information theory of speech perception. In order to improve the efficiency of testing of pronunciation quality, we propose to train a convolutional neural network using the best reference utterances from the user. The experimental results proved ...

Added: May 29, 2019

User Modeling on Mobile Device Based on Facial Clustering and Object Detection in Photos and Videos

Grechikhin I., Andrey V. Savchenko, , in: Pattern Recognition and Image Analysis* 2. Springer, 2019. P. 429–440.

The article describes an approach for extraction of user preferences based on the analysis of a gallery of photos and videos on mobile device. It is proposed to firstly use fast SSD-based methods in order to detect objects of interests in offline mode directly on mobile device. Next we perform facial analysis of all visual ...

Added: September 23, 2019

Nucleus segmentation: towards automated solutions

Hollandi R., Moshkov N., Paavolainen L. et al., Trends in Cell Biology 2022

Single nucleus segmentation is a frequent challenge of microscopy image processing, since it is the first step of many quantitative data analysis pipelines. The quality of tracking single cells, extracting features or classifying cellular phenotypes strongly depends on segmentation accuracy. Worldwide competitions have been held, aiming to improve segmentation, and recent years have definitely brought ...

Added: January 21, 2022

Semantic embeddings for program behaviour patterns

Chistyakov A., Lobacheva E., Kuznetsov A. et al., , in: Workshop of the 5th International Conference on Learning Representations (ICLR). [б.и.], 2017. P. 1–4.

In this paper, we propose a new feature extraction technique for program execution logs. First, we automatically extract complex patterns from a program's behavior graph. Then, we embed these patterns into a continuous space by training an autoencoder. We evaluate the proposed features on a real-world malicious software detection task. We also find that the ...

Added: October 31, 2018

Определение заболеваний маниока методами компьютерного зрения

Терещенко С. Н., Perov A., Осипов А. Л., Siberian Journal of Life Sciences and Agriculture 2021 Т. 13 № 1 С. 144–155

Background. Development of a convolutional neural network model for detecting cassava diseases from a mobile phone photo. Materials and methods. The material for the research was taken images with various types of cassava diseases, published in open access of the Kaggle platform. Research methods: theory of design and development of information systems, programming, methods of augmentation and extension ...

Added: November 17, 2021

Fault detection in Tennessee Eastman process with temporal deep learning models

Lomov I., Lyubimov M., Makarov I. et al., Journal of Industrial Information Integration 2021 Vol. 23 Article 100216

Automated early process fault detection and prediction remains a challenging problem in industrial processes. Traditionally it has been done by multivariate statistical analysis of sensor readings and, more recently, with the help of machine learning methods. The quality of machine learning models strongly depends on feature engineering, that in turn heavily relies on expertise of ...

Added: March 21, 2021

Weight Averaging Improves Knowledge Distillation under Domain Shift

Berezovskiy V., Morozov N., , in: The 2nd Workshop and Challenges for Out-of-Distribution Generalization in Computer Vision. ICCV 2023. [б.и.], 2023.

Knowledge distillation (KD) is a powerful model compression technique broadly used in practical deep learning applications. It is focused on training a small student network to mimic a larger teacher network. While it is widely known that KD can offer an improvement to student generalization in i.i.d setting, its performance under domain shift, i.e. the ...

Added: November 20, 2023

Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease

Umerenkov D., Herbert A., Konovalov Dmitrii et al., Life Science Alliance 2023 Vol. 6 No. 7 Article e202301962

Identifying roles for Z-DNA remains challenging given their dynamic nature. Here, we perform genome-wide interrogation with the DNABERT transformer algorithm trained on experimentally identified Z-DNA forming sequences (Z-flipons). The algorithm yields large performance enhancements (F1 = 0.83) over existing approaches and implements computational mutagenesis to assess the effects of base substitution on Z-DNA formation. We ...

Added: June 9, 2023

Bayesian Sparsification of Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., / Series 1 "Workshop on Learning to Generate Natural Language". 2017.

Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural ...

Added: October 19, 2017

ABC: A Big CAD Model Dataset For Geometric Deep Learning

Koch S., Matveev A., Jiang Z. et al., , in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019). IEEE, 2019. P. 9601–9611.

We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows ...

Added: November 26, 2019

The Deep Weight Prior

Atanov A., Ashukha A., Struminsky K. et al., , in: Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). ICLR, 2019. P. 1–17.

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of ...

Added: September 2, 2019