Автоматический анализ качества устной речи

Н. В. Карпов

?

Автоматический анализ качества устной речи

Вестник Нижегородского университета им. Н.И. Лобачевского. 2013. № 1.

Karpov N.

Methods for speech quality analysis were considered and one of them was researched empirically. It based on criterion of maximum creation speed of information in speaker vocal track output. Synthesized and experimentally examine a new algorithm for speech quality automatic analysis with used cepstrum transformation for signal parameterization.

Priority areas: IT and mathematics

Language: Russian

Full text

Keywords: автоматическое распознавание речи качество речи

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025

Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.

Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.

The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...

Added: November 9, 2025

Diffusion on language model embeddings for protein sequence generation

Meshchaninov V., Strashnov, P., Shevtsov A. et al., / Cornell University. Серия CoRR, arXiv:2403.03726 "Computing Research Repository,". 2025.

Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived ...

Added: October 5, 2025

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Shabalin A., Meshchaninov V., Vetrov D., / Series cs.CL, arXiv:2505.18853 "Computation and Language". 2025.

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic ...

Added: October 5, 2025

Compressed and Smooth Latent Space for Text Diffusion Modeling.

Meshchaninov V., Chimbulatov E., Shabalin A. et al., / Series cs.CL, arXiv:2506.21170 "Computation and Language". 2025.

Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising alternative by enabling parallel generation and flexible control; however, their application to text generation is hindered by the high dimensionality of token-level representations. We introduce COSMOS, a ...

Added: October 5, 2025

A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis

Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.

Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...

Added: September 23, 2025

On the construction of frieze patterns from partitions of convex polygons by nonintersecting diagonals

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 07600.

We demonstrate in an elementary way how to construct a frieze pattern of width m-3 from a partition of a convex m-gon by not intersecting diagonals. ...

Added: September 17, 2025

On one property of Catalan numbers

Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 20584.

We give a new proof of the following statement: the Catalan number C_n is divisible by n+2, if n is odd and n<> 3k+1. ...

Added: September 9, 2025

TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features

Bazhenov G., Платонов О. А., Prokhorenkova L., / Series arXiv:2409.14500 "arXiv:2409.14500 [cs.LG]". 2025.

Tabular machine learning is an important field for industry and science. In this f ield, table rows are typically treated as independent data samples, but additional information about the relations between these samples is sometimes available and can be used to improve predictive performance. Such information can be naturally modeled with a graph, hence tabular ...

Added: August 14, 2025

Low Sets and Closure Properties of Counting Function Classes

Ivanashev Y., / Series Computer Science "arxiv.org". 2025.

Added: July 29, 2025

ComputAgeBench: Epigenetic Aging Clocks Benchmark

Dudkovskaia Anastasiia, / Series 005140 "Biorxiv". 2025.

The success of clinical trials of longevity drugs relies heavily on identifying integrative health and aging biomarkers, such as biological age. Epigenetic aging clocks predict the biological age of an individual using their DNA methylation profiles, commonly retrieved from blood samples. However, there is no standardized methodology to validate and compare epigenetic clock models as ...

Added: July 18, 2025

An archaic reference-free method to jointly infer Neanderthal and Denisovan introgressed segments in modern human genomes

Planche L., Ilina A., Ávila-Arcos M. et al., / Series 005140 "Biorxiv". 2025.

Admixture between populations is a common feature of human history. Admixture events introduce new genetic variation that can fuel evolution. Characterizing the significance of admixture events on the evolution of a population across various species is of great interest to evolutionary geneticists. Local Ancestry Inference (LAI) methods infer genetic ancestry of an individual at a ...

Added: May 19, 2025

NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability

Raskovalov A., Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2024.

Added: April 28, 2025

Распознавание речи в корпусе аудиозаписей торговых представителей: проблемы, решения и исследовательские перспективы

Колмогорова П. А., В кн.: Лингвистическая семантика в пространственном измерении: Словарь. Дискурс. Корпус.: Екатеринбург: Кабинетный ученый, 2024. Гл. 9.2 С. 411–422.

Added: November 29, 2024

СОВРЕМЕННОЕ СОСТОЯНИЕ И ТЕНДЕНЦИИ РАЗВИТИЯ РЕЧЕВЫХ ТЕХНОЛОГИЙ

Kharlamov A. A., Чучупал В. Я., В кн.: Труды 21-й Международной конференции «Цифровая обработка сигналов и ее применение – DSPA-2019»Кн. 1. Вып. 21: Доклады 21-й Международной конференции.: Московское НТО радиотехники,электроники и связи им. А.С. Попова, 2019. С. 19–25.

За сравнительно короткий отрезок времени в области речевых технологий произошли фундаментальные изменения, которые привели к резкому улучшению характеристик математического и программного обеспечения для обработки речи, обусловили появление практических решений такого уровня, что использование речевых технологий превратилось в рутинную часть нашей повседневной жизни. Скачок в качестве работы методов и программного обеспечения для речевых технологий обусловлен развитием ...

Added: October 29, 2020

Voice command recognition in intelligent systems using deep neural networks

Sokolov A., Savchenko A., , in: 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2019. Ch. 19 P. 113–116.

In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending ...

Added: October 21, 2019

Нечеткое фонетическое кодирование речевых сигналов в системах обработки речевой информации

Savchenko A., Savchenko L., Радиотехника и электроника 2019 Т. 64 № 3 С. 274–280

Исследован фонетический подход для систем обработки голосовой информации. Разработан метод автоматического распознавания речевых сигналов, в котором каждому квазистационарному сегменту ставится в соответствие нечеткое множество фонем. Предложено использовать операцию вероятностной треугольной нормы для нечетких множеств, соответствующих входному фрейму и ближайшей к нему эталонной фонемы. Экспериментально показано, что разработанный метод позволяет на 1.5…5% снизить вероятность ошибочного распознавания ...

Added: March 18, 2019

Sequential Three-Way Decisions in Efficient Classification of Piecewise Stationary Speech Signals

Savchenko A., , in: International Joint Conference on Rough Sets, Springer, Cham.: Springer, 2017. P. 264–277.

In this paper it is proposed to improve performance of the automatic speech recognition by using sequential three-way decisions. At first, the largest piecewise quasi-stationary segments are detected in the speech signal. Every segment is classified using the maximum a-posteriori (MAP) method implemented with the Kullback-Leibler minimum information discrimination principle. The three-way decisions are taken ...

Added: October 26, 2018

Information Theoretic Analysis of Efficiency of the Phonetic Encoding–Decoding Method in Automatic Speech Recognition

Savchenko A., Savchenko V.V., Journal of Communications Technology and Electronics 2016 Vol. 61 No. 4 P. 430–435

A words phonetic decoding method in automatic speech recognition is considered. The properties of Kullback–Leibler divergence are used to synthesize the estimation of the distribution of divergence between minimum speech units (e.g., single phonemes) inside a single class. It is demonstrated that the min imum variance of the intraphonemic divergence is reached when the phonetic ...

Added: April 11, 2016

Теоретико-информационное обоснование и анализ эффективности метода фонетического кодирования-декодирования в задаче автоматического распознавания речи

Savchenko A., Савченко В. В., Радиотехника и электроника 2016 Т. 61 № 4 С. 373–379

Рассмотрен метод фонетического кодирования–декодирования слов в задаче автоматического распознавания речи. На основе свойств информационного рассогласования Кульбака–Лейблера синтезирована оценка распределения рассогласования между минимальными речевыми единицами типа отдельных фонем внутри одного класса. Показано, что наименьшая дисперсия внутрифонемного рассогласования достигается при настройке фонетической базы данных на голос конкретного (одного) диктора. Полученные оценки подтверждены результатам экспериментальных исследований в задаче ...

Added: October 8, 2015

Towards the creation of reliable voice control system based on a fuzzy approach

Savchenko A., Savchenko Lyudmila V., Pattern Recognition Letters 2015 Vol. 65 P. 145–151

The key purpose of this paper is to train a voice control system if a small amount of user speech data is available without need for general acoustic model if the latter does not fit to the user voice due to known variability sources (childhood, voice diseases, non-nativeness, etc.). We explore the possibility to increase ...

Added: September 10, 2015