?
Optimization of Gain in Symmetrized Itakura-Saito Discrimination for Pronunciation Learning
This paper considers an assessment and evaluation of the pronunciation quality in computer-aided language learning systems. We propose the novel distortion measure for speech processing by using the gain optimization of the symmetrized Itakura-Saito divergence. This dissimilarity is implemented in a complete algorithm for pronunciation
learning and improvement. At its first stage, a user has to achieve a stable pronunciation of all sounds by matching them with sounds of an ideal speaker. At the second stage, the recognition of sounds and their short sequences is carried out to guarantee the distinguishability of learned sounds. The training set may contain not only ideal sounds but the best utterances of a user obtained at the previous step. Finally, the word recognition accuracy is estimated by using deep neural networks finetuned on the best words from a user. Experimental study shows that
the proposed procedure makes it possible to achieve high efficiency for learning of sounds and their sequences even in the presence of noise in an observed utterance.