Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning

A. Savchenko; L. Savchenko; Savchenko V.

doi:10.1007/978-3-030-49988-4_30

Publications

?

Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning

Ch. 30. P. 440–454.

Savchenko A., Savchenko L., Savchenko V.

This paper considers an assessment and evaluation of the pronunciation quality in computer-aided language learning systems. We propose the novel distortion measure for speech processing by using the gain optimization of the symmetrized Itakura-Saito divergence. This dissimilarity is implemented in a complete algorithm for pronunciation
learning and improvement. At its first stage, a user has to achieve a stable pronunciation of all sounds by matching them with sounds of an ideal speaker. At the second stage, the recognition of sounds and their short sequences is carried out to guarantee the distinguishability of learned sounds. The training set may contain not only ideal sounds but the best utterances of a user obtained at the previous step. Finally, the word recognition accuracy is estimated by using deep neural networks finetuned on the best words from a user. Experimental study shows that
the proposed procedure makes it possible to achieve high efficiency for learning of sounds and their sequences even in the presence of noise in an observed utterance.

Keywords: convolutional neural networks signal processing Itakura-Saito divergence Gain optimization Computer-aided language learning Speech quality assessment

Publication based on the results of:

Research of robustness of network analysis algorithms (2020)

In book

Mathematical Optimization Theory and Operations Research, 19th International Conference, MOTOR 2020, Novosibirsk, Russia, July 6–10, 2020, (Т. 12095)

Cham: Springer, 2020.

Модернизированный полосовой дискретно-аналоговый фильтр с переносом частоты

Maltseva S. V., Жуков А. О., Сгибнев В. П. et al., Нано- и микросистемная техника 2026 Т. 28 № 2 С. 95–100

The article describes a modernized bandpass discrete analog filter with signal processing in the time domain. The design and operating principles of the filter are presented, as well as the basic and timing diagrams of signals at various points of the filter. The results of the analysis of the filter characteristics obtained through computer modeling ...

Added: March 3, 2026

Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков

Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127

This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...

Added: December 12, 2025

2025 17th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)

Niš: IEEE, 2025.

Added: November 29, 2025

Recognition of Mentally Pronounced Russian Phonemes Using Convolutional Neural Networks and Electroencephalography Data

Seleznev L. E., Chupakhin A. A., Kostenko V. A. et al., Optical Memory and Neural Networks (Information Optics) 2023 Vol. 32 No. 2 P. 73–85

We analyze a classification problem of mentally pronounced Russian phonemes based on data obtained by means of an electroencephalography device. We describe the data collection method as well as the methods of the obtained data processing. To solve the small sample size problem we present the augmentation techniques that use the time stretching and the ...

Added: October 2, 2025

Convolutional Neural Networks Decode Finger Movements in Motor Sequence Learning from MEG Data

Zabolotniy A., Chan R. W., Moiseeva V. et al., Frontiers in Neuroscience 2025 Vol. 19 Article 1623380

We demonstrated the feasibility of finger movement decoding with a tailored Convolutional Neural Network. The performance of our approach was comparable to complex deep learning architectures, while providing faster and interpretable outcome. This algorithmic strategy holds high potential for the investigation of the mechanisms underlying non-invasive neurophysiological recordings in cognitive neuroscience. ...

Added: October 2, 2025

Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?

Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84

Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...

Added: January 7, 2025

Proceedings Volume 11605, Thirteenth International Conference on Machine Vision

Teplyakov L., Kaymakov K., Shvets E. et al., SPIE, 2021.

Line detection is an important computer vision task traditionally solved by Hough Transform. With the advance of deep learning, however, trainable approaches to line detection became popular. In this paper we propose a lightweight CNN for line detection with an embedded parameter-free Hough layer, which allows the network neurons to have global strip-like receptive fields. ...

Added: November 5, 2024

2023 16th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)

Niš: IEEE, 2023.

Added: June 21, 2024

ALPACA: An Asymmetric Loss Prediction Algorithm for Channel Adaptation Based on a Convolutional-Recurrent Neural Network in URLLC Systems

Kirill Glinskiy, Kureev A., Khorov E., IEEE Access 2024 Vol. 12 P. 329–338

A key feature of 5G systems is the Ultra-Reliable Low-Latency Communication (URLLC), which can be used for remote surgery, smart grids, industrial control, etc. URLLC requires millisecond-level delays and very high reliability, i.e., less than 10−5 packet loss probability. The ability to satisfy these very strict quality of service requirements depends on selecting the Modulation and Coding ...

Added: January 17, 2024

14th International Conference, OPTIMA 2023, Petrovac, Montenegro, September 18–22, 2023, Revised Selected Papers. Communications in Computer and Information Science (CCIS, volume 1913)

Springer, 2023.

This book constitutes the refereed proceedings of the 14th International Conference on Advances in Optimization and Applications, OPTIMA 2023, held in Petrovac, Montenegro, during September 18–22, 2023. The 21 full papers included in this book were carefully reviewed and selected from 68 submissions. They were organized in topical sections as follows: mathematical programming; global optimization; continuous optimization; ...

Added: December 9, 2023

Lightweight and Elegant Data Reduction Strategies for Training Acceleration of Convolutional Neural Networks

Demidovskij A., Artyom Tugaryov, Aleksei Trutnev et al., Mathematics 2023 Vol. 14 No. 11 Article 3120

Due to industrial demands to handle increasing amounts of training data, lower the cost of computing one model at a time, and lessen the ecological effects of intensive computing resource consumption, the job of speeding the training of deep neural networks becomes exceedingly challenging. Adaptive Online Importance Sampling and IDS are two brand-new methods for ...

Added: September 12, 2023

Artificial Intelligence in Music, Sound, Art and Design: 12th International Conference, EvoMUSART 2023, Held as Part of EvoStar 2023, Brno, Czech Republic, April 12–14, 2023, Proceedings

Cham: Springer, 2023.

This book constitutes the refereed proceedings of the 12th European Conference on Artificial Intelligence in Music, Sound, Art and Design, EvoMUSART 2023, held as part of Evo* 2023, in April 2023, co-located with the Evo* 2023 events, EvoCOP, EvoApplications, and EuroGP. The 20 full papers and 7 short papers presented in this book were carefully reviewed ...

Added: April 4, 2023

Образовательные онлайн-технологии и обучение иностранным языкам при помощи компьютера: значение, объемы, возможности рынка

Tabakova N., Конкурентоспособность в глобальном мире: экономика, наука, технологии 2018 № 6

The author observes the current state of educational technologies online and prognoses for their developments, and discovers that computer-assisted language learning is significantly more popular with users than other types of online education. The author mentioned that most language teachers are still skeptical about online studies and doubts the prognosis for quick development of the ...

Added: February 15, 2023

Using Ground-Based Passive Reflectors for Improving UAV Landing

Yasentsev D., Shevgunov T., Efimov E. et al., Drones 2021 Vol. 5 No. 4 Article 137

The article reviews the problem of landing on hard-to-reach and poorly developed territories, especially in the case of unmanned aerial vehicles. Various landing systems and approaches are analyzed, and their key advantages and disadvantages are summarized; afterwards, an approach with passive reflectors is considered. A formal definition is provided for the main factors relative to ...

Added: February 6, 2023

22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers

Springer, 2022.

This book constitutes selected and revised papers from the 22nd International Conference on Mathematical Modeling and Supercomputer Technologies, MMST 2022, held in Nizhny Novgorod, Russia, in November 2022. The 20 full papers and 5 short papers presented in the volume were thoroughly reviewed and selected from the 48 submissions. They are organized in topical secions on computational methods ...

Added: December 26, 2022

Gain-optimized spectral distortions for pronunciation training

Savchenko A., Savchenko V., Savchenko L., Optimization Letters 2022 Vol. 16 No. 7 P. 2095–2113

This paper considers an assessment and evaluation of speech sound pronunciation quality in computer-aided language learning systems. We examine the gain optimization of spectral distortion measures between the speech signals of a native speaker and a learner. During training, a learner has to achieve stable pronunciation of all sounds. This is measured by computing the ...

Added: August 18, 2022

Parallel Computational Technologies: 16th International Conference, PCT 2022, Dubna, Russia, March 29–31, 2022, Revised Selected Papers

Springer, 2022.

This book constitutes the refereed proceedings of the 16th International Conference on Parallel Computational Technologies, PCT 2022, held in Dubna, Russia, during March 29–31, 2022. The 22 full papers included in this book were carefully reviewed and selected from 60 submissions. They were organized in topical sections as follows: high performance architectures, tools and technologies; parallel ...

Added: August 10, 2022

Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium -23rd International Conference, AIED 2022, Durham, UK, July 27–31, 2022, Proceedings, Part II

Springer, 2022.

This two-volume set LNAI 13355 and 13356 constitutes the refereed proceedings of the 23rd International Conference on Artificial Intelligence in Education, AIED 2022, held in Durham, UK, in July 2022. The 40 full papers and 40 short papers presented together with 2 keynotes, 6 industry papers, 12 DC papers, 6 Workshop papers, 10 Practitioner papers, 97 ...

Added: July 28, 2022