Распознавание изолированных слов на основе взвешенного голосования дикторозависимых нейросетевых моделей

Л. В. Савченко

doi:10.17587/it.26.290-296

Publications

?

Распознавание изолированных слов на основе взвешенного голосования дикторозависимых нейросетевых моделей

Информационные технологии. 2020. Т. 26. № 5. С. 290–296.

Savchenko L.

article deals with the problem of isolated words recognition based on deep convolutional neural networks. The use of
existing recognition systems in practice is limited by an insufficiently high degree of their reliability functioning in conditions of intense acoustic noise, such as street noise, sounds from passing vehicles, etc. Nowadays, the most accurate recognition methods are characterized by the formation of acoustic models with deep learning technologies and, in particular, convolutional neural networks. For image processing problems the possibility of adaptation of such networks to a new domain with additional finetuning on rather small training samples is well studied. In this paper we proposed to perform additional training of networks for adaptation of acoustic models on a speaker voice with use of small number of the utterances. In order to reduce the error rate, we consider an ensemble of several different speaker-dependent neural network architectures that have been trained in such a way. The final decision is made by a weighted voting rule, in which the weight of each acoustic model is determined in proportion to the accuracy estimated on the training set. The experimental results for recognition of English commands proved
that such ensemble of pre-trained acoustic models can significantly improve accuracy compared to traditional pre-trained models, especially if the white Gaussian noise is added to the input signal.

Research target: Engineering and Technology

Priority areas: IT and mathematics

Keywords: speech recognition распознавание речи deep learning convolutional neural networks сверточные нейронные сети глубокое обучение isolated words recognition ensemble of neural networks acoustic model adaptation weighted voting ансамбль моделей адаптация акустической модели взвешенное голосование

Система постановки произношения на основе сверточных нейронных сетей и информационной теории восприятия речи

Savchenko L., Информационные технологии 2019 Т. 25 № 5 С. 313–318

We consider a problem of computer assisted language and pronunciation learning based on the deep learning methods and the information theory of speech perception. In order to improve the efficiency of testing of pronunciation quality, we propose to train a convolutional neural network using the best reference utterances from the user. The experimental results proved ...

Added: May 29, 2019

Deep convolutional neural networks capabilities for binary classification of polar mesocyclones in satellite mosaics

Криницкий М. А., Verezemskaya P., Гращенков К. В. et al., Atmosphere 2018 Vol. 9 No. 426 P. 1–23

Polar mesocyclones (MCs) are small marine atmospheric vortices. The class of intense MCs, called polar lows, are accompanied by extremely strong surface winds and heat fluxes and thus largely influencing deep ocean water formation in the polar regions. Accurate detection of polar mesocyclones in high-resolution satellite data, while challenging, is a time-consuming task, when performed ...

Added: November 26, 2020

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Сверточные нейронные сети в задаче распознавания пола и возраста по видеоизображению

Kharchevnikova A., Savchenko A., В кн.: Сборник трудов IV Международной конференции и молодёжной школы "Информационные технологии и нанотехнологии" (ИТНТ 2018). Самара: Предприятие "Новая техника", 2018. Гл. 124 С. 916–924.

In this paper we examine the age and gender video-based recognition problem using deep convolutional neural networks. The comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames is presented. In order to improve the age and gender identification accuracy we implement the video-based recognition system with several aggregation methods. We provide the ...

Added: October 18, 2018

Применении Фишеровских ядер к задаче идентификации диктора

Gostev I. M., Ermilov A., Известия Юго-Западного государственного университета 2011 № 2 С. 15–22

In this article we consider application of Support Vector Machines with different types of kernels to the task of speaker identification. We use Fisher features for several types of channels (telephone, GSM, microphone). We analyze dependence of accuracy from length of input sentence. ...

Added: January 31, 2014

Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Association for Computational Linguistics, 2019.

The 4th Workshop on Representation Learning for NLP (RepL4NLP) will be hosted by ACL 2019 and held on 2 August 2019. The workshop is being organised by Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Alexis Conneau, Johannes Welbl, Xian Ren and Marek Rei; and advised by Kyunghyun Cho, Edward Grefenstette, Karl Moritz ...

Added: November 1, 2019

Simulating the time projection chamber responses at the MPD detector using generative adversarial networks

A. Maevskiy, F. Ratnikov, Zinchenko A. et al., The European Physical Journal C - Particles and Fields 2021 Vol. 81 Article 599

High energy physics experiments rely heavily on the detailed detector simulation models in many tasks. Running these detailed models typically requires a notable amount of the computing time available to the experiments. In this work, we demonstrate a new approach to speed up the simulation of the Time Projection Chamber tracker of the MPD experiment at ...

Added: July 12, 2021

Deep learning approach for predicting functional Z-DNA regions using omics data

Beknazarov N., Jin S., Poptsova M., Scientific Reports 2020 Vol. 10 P. 19134

Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not ...

Added: December 11, 2020

Classification of a Sequence of Objects with the Fuzzy Decoding Method

Savchenko A., Savchenko L. V., Lecture Notes in Artificial Intelligence 2014 Vol. 8536 P. 309–318

The problem of recognition of a sequence of objects (e.g., video-based image recognition, phoneme recognition) is explored. The generalization of the fuzzy phonetic decoding method is proposed by assuming the distribution of the classified object to be of exponential type. Its preliminary phase includes association of each model object with the fuzzy set of model ...

Added: July 25, 2014

Fuzzy Analysis and Deep Convolution Neural Networks in Still-to-video Recognition

Savchenko A., Belova N. S., Savchenko Lyudmila V., Optical Memory and Neural Networks (Information Optics) 2018 Vol. 27 No. 1 P. 23–31

We discuss the video classification problem with the matching of feature vectors extracted using deep convolutional neural networks from each frame. We propose the novel recognition method based on representation of each frame as a sequence of fuzzy sets of reference classes whose degrees of membership are defined based on asymptotic distribution of the Kullback–Leibler ...

Added: February 9, 2018

Machine Learning Use Cases in Cybersecurity

AvdoshinS.M., Lazarenko A.V., Chichileva N.I. et al., Proceedings of the Institute for System Programming of the RAS 2019 Vol. 31 No. 5 P. 191–202

The problem regarding the use of machine learning in cybersecurity is difficult to solve because the advances in the field offer many opportunities that it is challenging to find exceptional and beneficial use cases for implementation and decision making. Moreover, such technologies can be used by intruders to attack computer systems. The goal of this ...

Added: December 31, 2019

Domain adaptation with gradient reversal for MC/real data calibration

Ryzhikov A., Ustyuzhanin A., Journal of Physics: Conference Series 2018 Vol. 1085 P. 1–6

In the research, a new approach for finding rare events in high-energy physics was tested. As an example of physics channel the decay of \tau -> 3 \mu is taken that has been published on Kaggle within LHCb-supported challenge. The training sample consists of simulated signal and real background, so the challenge is to train ...

Added: December 11, 2017

Детектирование эмоций в мультимедиа контенте

А. С. Попова, А. Г. Рассадин, А. А. Пономаренко, В кн.: Материалы XXIII международной научно-технической конференции «Информационные системы и технологии-2017». [б.и.], 2017. С. 852–857.

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational ...

Added: October 18, 2017

Алгоритм работы программной реализации фильтра Виннера

Кузнецов Д. С., Естественные и технические науки 2009 № 4 С. 365–369

В данной статье рассматривается фильтр Винера в качестве метода повышения эффективности работы систем распознавания речи. Приводятся сведения о возможных модификациях фильтра Винера для повышения степени шумоподавления. Рассматривается алгоритм работы программной реализации классического фильтра Винера и его модификаций. ...

Added: February 21, 2013

Phonetic encoding method in the isolated words recognition problem

Savchenko A., Journal of Communications Technology and Electronics 2014 Vol. 59 No. 4 P. 339–345

A phonetic approach to the problem of automatic recognition of isolated words is investigated.The phonetic encoding method whereby each word from a vocabulary is associated with the code sequenceof stable phonemes is proposed. The informationtheoretical estimate of vocabulary confusability, the calcuations of which rely on the phonetic database of a speaker and the communications channel ...

Added: April 8, 2014

Emotion Recognition in Sound

Popova A. S., Alexandr G. Rassadin, Alexander A. Ponomarenko, , in: Advances in Neural Computation, Machine Learning, and Cognitive Research. Selected Papers from the XIX International Conference on Neuroinformatics, October 2-6, 2017, Moscow, RussiaVol. 736. Cham: Springer, 2017. P. 117–124.

In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. ...

Added: October 18, 2017

Deep learning based methods for estimating distribution of coalescence rates from genome-wide data

Khomutov E., Arzymatov K., Shchur V., Journal of Physics: Conference Series 2021 Vol. 1740 Article 012031

Demographic and population structure inference is one of the most important problems in genomics. Population parameters such as effective population sizes, population split times and migration rates are of high interest both themselves and for many applications, e.g. for genome-wide association studies. Hidden Markov Model (HMM) based methods, such as PSMC, MSMC, coalHMM etc., proved ...

Added: May 17, 2021

Proceedings of International Joint Conference on Neural Networks 2020 (IJCNN 2020)

Piscataway: IEEE, 2020.

2020 International Joint Conference on Neural Networks (IJCNN) held virtually, as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI) 2020. IJCNN 2020 is jointly organized by the IEEE Computational Intelligence Society (CIS) and the International Neural Network Society (INNS). For IJCNN 2020 (and when WCCI is organized in even-numbered years) IEEE CIS ...

Added: October 15, 2020

Learning velocity model for complex media with deep convolutional neural networks

Gremyachikh L., Ustyuzhanin A., Станкевич А. et al., / Series 2110.08626 "Machine Learning". 2021.

The paper considers the problem of velocity model acquisition for a complex media based on boundary measurements. The acoustic model is used to describe the media. We used an open-source dataset of velocity distributions to compare the presented results with the previous works directly. Forward modeling is performed using the grid-characteristic numerical method. The inverse ...

Added: May 24, 2022

Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition

Savchenko A., Optical Memory and Neural Networks (Information Optics) 2017 Vol. 26 No. 2 P. 129–136

We analyzed the way to increase computational efficiency of video-based image recognition methods with matching of high dimensional feature vectors extracted by deep convolutional neural networks. We proposed an algorithm for approximate nearest neighbor search. At the first step, for a given video frame the algorithm verifies a reference image obtained when recognizing the previous ...

Added: June 30, 2017

Разработка универсальной роботизированной платформы

Romanov A., Amerikanov A., Lezhnev E. et al., Прикладная радиоэлектроника 2016 Т. 15 № 2 С. 123–126

The paper describes the development of a robotic platform for buildings. The versatility of the platform allows its applying in various fields of human activity, both in the remote control and autonomous regime. The main steps involved in creating a robotic platform are described; its characteristics and working results are given. ...

Added: October 7, 2016

Russian Q&A Method Study: From Naive Bayes to Convolutional Neural Networks

Nikolaev K., Malafeev A., , in: Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018. Springer, 2018. Ch. 12 P. 121–126.

This paper deals with automatic classification of questions in the Russian language. In contrast to previously used methods, we introduce a convolutional neural network for question classification. We took advantage of an existing corpus of 2008 questions, manually annotated in accordance with a pragmatic 14-class typology. We modified the data by reducing the typology to ...

Added: February 15, 2019

Proceedings 2020 IEEE East-West Design & Test Symposium (EWDTS)

Sidorenko V., Кулагин М. А., Varna: IEEE, 2020.

The main target of the IEEE East-West Design & Test Symposium (EWDTS) is to exchange experiences between scientists and technologies from Eastern and Western Europe, as well as North America and other parts of the world, in the field of design, design automation and test of electronic circuits and systems. The symposium is typically held ...

Added: September 8, 2020

Труды XXIII-й научной конференции по радиофизике. 13-21 мая 2019 г

Н. Новгород: ННГУ, 2019.

В сборник включены материалы докладов 23-ой Конференции по радиофизике, проходившей 13-21 мая 2019 года на радиофизическом факультете Национального исследовательского Нижегородского государственного университета им. Н.И. Лобачевского (ННГУ). Тематика докладов охватывает основные научные направления, развиваемые на факультете. Работы выполнены сотрудниками, аспирантами и студентами ННГУ, а также сотрудниками научно-исследовательских институтов и высокотехнологичных предприятий Нижнего Новгорода. Издаётся по решению ...

Added: January 10, 2020