Использование вероятностного распределения над множеством классов в задаче классификации арабских диалектов

О. В. Дурандин; Н. Ю. Золотых; Хилал Н. Р.; Стребков Д. Ю.

doi:10.17586/2226-1494-2017-17-1-110-116

Publications

?

Использование вероятностного распределения над множеством классов в задаче классификации арабских диалектов

Научно-технический вестник информационных технологий, механики и оптики. 2017. № 1(107). С. 110–116.

Durandin O., Zolotykh N., Хилал Н. Р., Стребков Д. Ю.

Subject of Research.We propose an approach for solving machine learning classification problem that uses the information about the probability distribution on the training data class label set. The algorithm is illustrated on a complex natural language processing task - classification of Arabic dialects. Method. Each object in the training set is associated with a probability distribution over the class label set instead of a particular class label. The proposed approach solves the classification problem taking into account the probability distribution over the class label set to improve the quality of the built classifier. Main Results. The suggested approach is illustrated on the automatic Arabic dialects classification example. Mined from the Twitter social network, the analyzed data contain word-marks and belong to the following six Arabic dialects: Saudi, Levantine, Algerian, Egyptian, Iraq, Jordan, and to the modern standard Arabic (MSA). The paper results demonstrate an increase of the quality of the built classifier achieved by taking into account probability distributions over the set of classes. Experiments carried out show that even relatively naive accounting of the probability distributions improves the precision of the classifier from 44% to 67%. Practical Relevance. Our approach and corresponding algorithm could be effectively used in situations when a manual annotation process performed by experts is connected with significant financial and time resources, but it is possible to create a system of heuristic rules. The implementation of the proposed algorithm enables to decrease significantly the data preparation expenses without substantial losses in the precision of the classification.

Research target: Computer Science Mathematics Philology and Linguistics

Priority areas: IT and mathematics mathematics

Keywords: аннотация automatic classification автоматическая классификация классификация текстов annotation clustering and classification кластеризация и классификация text classification диалекты арабского языка Arabic dialects

A framework for text mining on Twitter: a case study on joint comprehensive plan of action (JCPOA)- between 2015 and 2019

Behzadidoost R., Quality and Quantity 2021 Vol. 56 No. 5 P. 3053–3084

In the big data era, there is a necessity for effective frameworks to collect, retrieve, and manage data. As not all tweets are hashtagged by users, retrieving them is a complicated task. To address this issue, we present a rule-based expert system classifier that uses the well-known concept of fingerprint in the judicial sciences. This ...

Added: March 27, 2026

The effect of spelling errors on reading tasks: a study on Russian.

Slioussar N., Chernova D., Magomedova V. et al., The Mental Lexicon 2026 P. 1–31

Many studies on different languages analyzed how spelling errors are produced and detected. Recently, a new generalization was made for several languages: frequently misspelled words are read more slowly, even when they are written correctly and one knows how to spell them. This is explained by the lower quality of their lexical representations diluted by ...

Added: March 26, 2026

Паратекст о паратексте

Kasatkina A., Сергеев М. Л., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2025 Т. 21 № 3 С. 13–25

This article introduces a collection of publications selected from the Proceedings of the conference “Circum Text: Para, Meta-, and Other Marginalia” (Institute for Linguistic Studies RAS, St. Petersburg, October 19–21, 2023). It describes the general agenda of paratextual studies and aligns the selected articles with its various aspects. Paratext is a variety of verbal and ...

Added: March 25, 2026

О задаче построения децентрализованной интеллектуальной транспортной системы на основе протокола RAFT и кластеризации по сетевому расстоянию.

Kaperko A., Городничев М. Г., Саксонов Е. А. et al., Вестник Рязанского государственного радиотехнического университета, Российская Федерация 2025 № 94 С. 59–67

The article is devoted to the development and experimental evaluation of a decentralized architecture for an intelligent transport system (ITS) based on the Raft consensus protocol and the network distance metric (RTT) server clustering method. It is shown that existing solutions either require manual configuration and centralized coordination, or are not optimized for latency with ...

Added: March 25, 2026

On flexibility of affine factorial varieties

Arzhantsev I., Shakhmatov K., Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales - Serie A: Matematicas 2026 Vol. 120 Article 55

We give a criterion of factoriality of a suspension. This allows to construct many examples of flexible affine factorial varieties. In particular, we find a homogeneous affine factorial 3-fold that is not a homogeneous space of an algebraic group. ...

Added: March 24, 2026

Emergence of champion solitons from two-solitary-wave interactions in the fourth-order generalized Korteweg–de Vries equation

Flamarion M. V., Pelinovsky E., Chaos, Solitons and Fractals 2025 Vol. 208 No. 3 Article 118271

Two-solitary-wave interactions are investigated within the fourth-order generalized Korteweg– de Vries equation. This equation is closely related to the classical Korteweg–de Vries equation but includes a quartic nonlinear term. We show that, although collisions between two solitary waves are not perfectly elastic, only a small amount of radiation is generated during the interaction. This allows a clear characterization of ...

Added: March 22, 2026

On string functions of the generalized parafermionic theories, mock theta functions, and false theta functions

Borozenets N., Mortenson E., Advances in Mathematics 2026 Vol. 484 Article 110684

Kac and Wakimoto introduced the admissible highest weight representations as a conjectural classification of all modular-invariant representations of the affine Kac–Moody algebras. For the affine Kac–Moody algebra A_1^{(1)} their conjectural construction has been proved. Using Kac and Wakimoto's result, Ahn, Chung, and Tye introduced the generalized Fateev–Zamolodchikov parafermionic theories, whose chiral current algebras were recently ...

Added: March 22, 2026

Static manifolds with boundary: Their geometry and some uniqueness theorems

Medvedev V., Annales Henri Poincare. A Journal of Theoretical and Mathematical Physics 2026 P. 1–33

Static manifolds with boundary appear naturally in the context of the prescribed scalar curvature problem on manifolds with boundary, when the mean curvature of the boundary is also prescribed. They also arise in the setting of gen eral relativity: for example the time-slice of the photon sphere on the Riemannian Schwarzschild manifold splits it into static manifolds with boundary. ...

Added: March 21, 2026

О решении детерминированной и стохастической задачи домашнего хозяйства с конечным горизонтом планирования

Pilnik N., Экономический журнал Высшей школы экономики 2025 Т. 29 № 1 С. 42–71

The article uses the example of an optimization problem of a household that makes a decision on the volumes of consumption and investment to show what difficulties arise in deterministic and stochastic formulations on a finite time interval. In order to make the problem solvableon a finite time interval, a special terminal condition on the ...

Added: March 19, 2026

Особенности стратегии убеждения в российском и китайском политическом дискурсе (на материале политических ток-шоу «60 минут» и «这就是中国» («Это Китай»))

Бинштейн М. М., Вестник Томского государственного университета. Филология 2026 № 99 С. 5–27

The article explores the argumentative nature of political discourse, which, according to the authors, becomes the key to the analysis of the communicativestrategy of persuasion. The aim of the research is a comparative analysis of speeches by Russian and Chinese politicians, identifying similarities and differences in the use of rhetorical devices when implementing the persuasion ...

Added: March 19, 2026

Английский язык для профессиональных целей: Когнитивная нейробиология

Zakharova A. V., Мищук А. М., M.: Флинта, 2025.

The aim of the textbook is to develop English skills and competences of biology students to a level necessary for successful oral and written communication in academic and professional spheres. The textbook materials allow for the improvement of essential language skills that are required for academic and professional communication. The textbook consists of four sections that cover the ...

Added: March 19, 2026

Hausdorff dimension estimates for Sudler products with positive lower bound

Гайфулин Д. Р., Hauke M., Nonlinearity 2025 Vol. 38 No. 6 Article 065008

Given an irrational number $\alpha$, we study the asymptotic behaviour of the Sudler product denoted by $P_N(\alpha) =\prod_{r=1}^N 2\lvert \sin \pi r \alpha \rvert$. We show that $\liminf_{N \to \infty} P_N(\alpha) >0$ and $\limsup_{N \to \infty} P_N(\alpha)/N < \infty$ whenever the sequence of partial quotients in the continued fraction expansion of $\alpha$ exceeds 3 only finitely ...

Added: March 19, 2026

О степени неразрешимости теории фигур в линейных пространствах

Dudakov S., Математика и теоретические компьютерные науки 2025 Т. 2 № 4 С. 51–65

We study the additive theory of arbitrary figures in linear spaces, that is, the theory of addition extended to sets of vectors. Our main result is the following: if a linear space is infinite, then the additive theory of figures allows to interpret second-order arithmetic and, therefore, has this or higher degree of undecidability. For ...

Added: March 18, 2026

Потенциал терапевтического применения спектроскопии в ближней инфракрасной области после инсульта (обзор)

Mokienko O., Современные технологии в медицине 2025 Т. 17 № 2 С. 73–85

The advancement of novel technologies for the rehabilitation of post-stroke patients represents a significant challenge for a range of interdisciplinary fields. Near-infrared spectroscopy (NIRS) is an optical neuroimaging technique based on recording local hemodynamic changes at the cerebral cortex level. The technology is typically employed in post-stroke patients for diagnostic purposes, including the assessment of ...

Added: March 18, 2026

О ТЕОРИЯХ АЛГЕБР ПОДМНОЖЕСТВ И РЕШЁТОК ПОДПРОСТРАНСТВ В КОНЕЧНЫХ ЛИНЕЙНЫХ ПРОСТРАНСТВАХ

Dudakov S., Вестник Тверского государственного университета. Серия: Прикладная математика 2025 № 1 С. 5–13

For infinite linear spaces, in our previous works, we have shown that theories of figures and subspaces are of high undecidability degree. They allow interpreting elementary arithmetic or second-order arithmetic (for infinite figures). For finite linear spaces, such a claim doesn't hold. It is because we can algorithmically enumerate all finite linear spaces and find ...

Added: March 18, 2026

О СЛОЖНОСТИ ПРОБЛЕМЫ ТОТАЛЬНОЙ ВЫВОДИМОСТИ В НЕУКОРАЧИВАЮЩИХ И КОНТЕКСТНО-СВОБОДНЫХ ГРАММАТИКАХ

Dudakov S., Карлов Б. Н., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 524 № 1 С. 11–18

In this paper we study the problem of total derivability in context-free, noncontracting, and context-sensitive grammars. Given a grammar and a terminal word, one has to determine whether there exists a derivation of this word which uses each production no less than a given number of times. It is proved that the problem of total ...

Added: March 18, 2026

Дискриминативная лемматизация сокращений в эпоху LLM

Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155

This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...

Added: March 10, 2026

Transformer-based approaches for lemmatizing abbreviations in Russian texts

Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47

This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...

Added: March 10, 2026

Homogeneous maximizers of the Blaschke-Santalo-type functionals

Kolesnikov A., / Series arXiv "math". 2025.

We study Blaschke--Santal{ó}-type inequalities for N>=2 sets (functions) and a special class of cost functions. In particular, we prove new results about reduction of the maximization problem for the Blaschke--Santal{ó}-type functional to homogeneous case (functional inequalities on the sphere) and extend the symmetrization argument to the case of N>2 sets. We also discuss links to the ...

Added: February 13, 2026

Development of a Language Model for Automated Classification of English-Language Scientific Articles by SRSTI Codes

V. V. Zunin, A. I. Afonin, V. I. Anoshin et al., Automatic Documentation and Mathematical Linguistics 2025 Vol. 59 No. 5 P. 287–293

The development of an artificial intelligence-based language model for classifying English-language scientific articles by SRSTI codes is described. This improves the processes of reviewing and indexing scientific publications. A pre-processed dataset of scientific articles was used for training and testing the models. An architecture for cascade classification was developed, and the performance of models with ...

Added: February 11, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

On finding formal power-logarithmic expansions of solutions to q-difference equations

Gaianov N., Parusnikova A., / Cornell University. Серия math "arxiv.org". 2025.

An algebraic q-difference equation is considered. A sufficient condition for the existence of a formal power-logarithmic expansion of a solution to such an equation in the neighborhood of zero is proposed. An example of applying this sufficient condition for constructing a formal expansion of a solution to a certain q-difference analogue of the fifth Painlevé equation ...

Added: December 25, 2025

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Ideal of the variety of flexes of plane cubics

Popov V., / Series arXiv "math". 2025. No. 2502.01539.

We prove that the variety of flexes of algebraic curves of degree 3 in the projective plane is an ideal theoretic complete intersection in the product of a two-dimensional and a nine-dimensional projective spaces. ...

Added: December 16, 2025