Gender domain adaptation for automatic speech recognition

A. Sokolov; A. Savchenko

doi:10.1109/SAMI50585.2021.9378626

Publications

?

Gender domain adaptation for automatic speech recognition

P. 413–418.

Sokolov A., Savchenko A.

This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male and female subsets, respectively, if the layers in the encoder and decoder are not frozen, and the tuning is started from the last checkpoints. Moreover, we adapted our base model on the complete L2 Arctic dataset of accented speech and finetuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1-2% lower WER when compared to the model tuned on the whole L2 Arctic dataset. Finally, it was experimentally confirmed that the concatenation of the pretrained voice embeddings (x-vector) and embeddings from a conventional encoder cannot significantly improve the speech recognition accuracy.

Language: English

DOI

Keywords: speaker adaptation deep neural networks automatic speech recognition

In book

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)

IEEE, 2021.

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Sokolov A., / Series Computer Science "arxiv.org". 2021.

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...

Added: November 17, 2020

Voice command recognition in intelligent systems using deep neural networks

Sokolov A., Savchenko A., , in: 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE, 2019. Ch. 19 P. 113–116.

In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending ...

Added: October 21, 2019

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Garipov T., Izmailov P., Подоприхин Д. А. et al., , in: Advances in Neural Information Processing Systems 31 (NIPS 2018). [б.и.], 2018. P. 1–10.

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. ...

Added: February 27, 2019

ПРИМЕНЕНИЕ ГЛУБОКИХ НЕЙРОННЫХ СЕТЕЙ ДЛЯ КЛАССИФИКАЦИИ БОЛЬШИХ ОБЪЕМОВ АСТРОНОМИЧЕСКИХ ДАННЫХ

Gorbunov A. A., Isaev E., Samodurov V., Radio Physics and Radio Astronomy 2017 Т. 22 № 4 С. 270–275

In the process of astronomical observations are collected vast amounts of data. BSA (Big Scanning Antenna) LPI used in the study of impulse phenomena, daily logs 87.5 GB of data (32 TB per year). Experts classified 83096 individual observations (on the segment of the study July 2012 - October 2013). Over 75% of the sample ...

Added: October 15, 2017

Semi-automated Speaker Adaptation: How to Control the Quality of Adaptation?

Savchenko A., Lecture Notes in Computer Science 2014 Vol. 8509 P. 638–646

Since the early 1990s, speaker adaptation have become one of the intensive areas in speech recognition. State-of-the-art batch-mode adaptation algorithms assume that speech of particular speaker contains enough information about the user's voice. In this article we propose to allow the user to manually verify if the adaptation is useful. Our procedure requires the speaker ...

Added: July 25, 2014

Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)

[б.и.], 2018.

Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) ...

Added: October 29, 2018

Black-Box Optimization with Local Generative Surrogates

Belavin V., Ustyuzhanin A., Sergey Shirobokov et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Curran Associates, Inc., 2020. P. 14650–14662.

Added: February 14, 2021

Domain-independent Classification of automatic Speech Recognition Texts

Mescheryakova E.I., Nesterenko L.V., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23). M.: -, 2017.

Added: January 4, 2019

Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

Belomestny D., Naumov A., Puchkin N. et al., Neural Networks 2023 Vol. 161 P. 242–253

This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded ...

Added: July 13, 2022

Improving the Accuracy of One-Shot Detectors for Small Objects in X-ray Images

Demochkina P., Savchenko A., , in: Proceedings of IEEE International Russian Automation Conference (RusAutoCon 2020). IEEE, 2020. Ch. 110 P. 610–614.

In this paper, we address the problem of detecting small objects on high-quality X-ray imagesusing deep neural networks. We propose to implement the two-stage approach, in which, firstly, input image issplit into partially overlapping blocks to make small objects more discriminative for detection. Secondly, the small blocks are fed into conventional single-shot detectors. These detectors ...

Added: October 3, 2020

Structured Sparsification of Gated Recurrent Neural Networks

Lobacheva E., Chirkova N., Markovich A. et al., , in: Thirty-Fourth AAAI Conference on Artificial IntelligenceVol. 34. AAAI Press, 2020. Ch. 5938 P. 4989–4996.

Added: October 29, 2020

Uncertainty Estimation in Autoregressive Structured Prediction

Andrey Malinin, Gales M., , in: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021). ICLR, 2021. ICLR, 2021. P. 1–31.

Added: November 1, 2021

Advances in Computational Intelligence. IWANN 2019

Berlin: Springer, 2019.

This two-volume set LNCS 10305 and LNCS 10306 constitutes the refereed proceedings of the 15th International Work-Conference on Artificial Neural Networks, IWANN 2019, held at Gran Canaria, Spain, in June 2019. The 150 revised full papers presented in this two-volume set were carefully reviewed and selected from 210 submissions. The papers are organized in topical sections ...

Added: July 29, 2019

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Shuranov E., / Series Computer Science "arxiv.org". 2021.

Added: February 14, 2023

Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications

Meyer J., Rauchenstein L., Eisenberg J., , in: Proceedings of The 12th Language Resources and Evaluation ConferenceVol. 12. European Language Resources Association (ELRA), 2020. P. 6462–6468.

We describe the creation of the Artie Bias Corpus, an English dataset of expert-validated <audio, transcript> pairs with demographic tags for age, gender, accent. We also release open software which may be used with the Artie Bias Corpus to detect demographic bias in Automatic Speech Recognition systems, and can be extended to other speech technologies. ...

Added: April 20, 2021

Uncertainty Estimation via Stochastic Batch Normalization

Ashukha A., Vetrov D., Molchanov D. et al., , in: Workshop of the 6th International Conference on Learning Representations (ICLR). International Conference on Learning Representations, ICLR, 2018. P. 1–6.

In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To ...

Added: October 31, 2018

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Lobacheva E., Kodryan M., Chirkova N. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Curran Associates, Inc., 2021. P. 21545–21556.

Added: December 29, 2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Ryabinin M., Malinin A., Gales M., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Curran Associates, Inc., 2021. P. 6023–6035.

Added: October 31, 2021

The Appliance of Deep Neural Networks in the Process of Managing Chemical Enterprises

Kulyasova E. V., Kulyasov N.S., Puchkov A. Y., , in: Journal of Physics: Conference Series Volume 1260, 2019 Mechanical Science and Technology Update 23–24 April 2019, Omsk, Russian Federation. IOP Publishing, 2019. Ch. 3 P. 032024–032024.

This article is introduced into the perspective tendencies of the digital transformation of chemical enterprises which allow to improve the process of managing enterprises of the branch. Presented the algorithms of managing and technological information processing based on deep neural network apparatus. New approaches to data processing known as video analytics are applied; it allows ...

Added: September 27, 2024

Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems

Savchenko L.V., Savchenko A.V., Journal of Communications Technology and Electronics 2019 Vol. 64 No. 3 P. 238–244

In this paper, we studied the phonetic approach for voice processing. A method for automatic recognition of speech signals, in which each quasistationary segment is associated with a fuzzy set of phonemes, was developed. We proposed the operation of the probabilistic triangular norm for fuzzy sets corresponding to the input frame and the nearest reference phoneme. The developed ...

Added: June 7, 2019

On Power Laws in Deep Ensembles

Lobacheva E., Chirkova N., Kodryan M. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Curran Associates, Inc., 2020. P. 2375–2385.

Added: October 29, 2020

Deep neural networks performance optimization in image recognition

A. G. Rassadin, A. V. Savchenko, , in: Proceedings of the III International Conference on Information Technologies and Nanotechnologies (ITNT). Самара: Новая техника, 2017. P. 649–654.

In this paper, we consider the problem of insufficient runtime and memory-space complexities of contemporary deep convolutional neural networks in the problem of image recognition. A survey of recent compression methods and efficient neural networks architectures is provided. The experimental study is focused on the visual emotion recognition problem. We compare the computational speed and ...

Added: September 8, 2017

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Kopeykina Lyudmila, Savchenko A., , in: 2019 International Russian Automation Conference (RusAutoCon). IEEE, 2019. P. 1–6.

The authors consider the problem of automatic detection of private scanned documents based on text recognition with deep neural networks. The paper suggests implementing a two-phase approach with the first stage which includes efficient EAST text detection and recognition using Tesseract OCR Engine. Secondly, the authors classify the privacy of a scanned document by deep ...

Added: October 21, 2019