Shrink the Longest: Improving Latent Space Isotropy with Simplicial Geometry

The development of an artificial intelligence-based language model for classifying English-language scientific articles by SRSTI codes is described. This improves the processes of reviewing and indexing scientific publications. A pre-processed dataset of scientific articles was used for training and testing the models. An architecture for cascade classification was developed, and the performance of models with ...

Added: February 11, 2026

Blurred Magnitude Homology of Functional Connectome for ASD Diagnosis

Alexander Kachura, Vsevolod Chernyshev, Kachan O. et al., Frontiers in Psychiatry 2026 Vol. 16 Article 1677282

Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders. Existing studies show that adults with ASD may experience accelerated or altered neurocognitive aging. Consequently, cognitive decline in people with ASD can be delayed if timely measures are taken to treat this disorder. This study focuses on the development of a new algorithm ...

Added: January 21, 2026

A Language and Its Holes: The First-Order Homology of the Large-Scale Geometrical Structure of a Natural Language

Vasilii A. Gromov, Dang Q. N., Asel S. Erbolova, Complexity 2025 Vol. 2025 No. 1 Article 9659172

The present paper employs topological data analysis methods to reveal ‘holes’ (stable persistent homologies) in the semantic spaces of words, bigrams, and trigrams of the English and Russian languages, and to ascertain their boundaries. Furthermore, the paper selects those holes that belong to the large‐scale (coarse‐grained) structure of the language that are not just local ...

Added: November 11, 2025

A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis

Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.

Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...

Added: September 23, 2025

Индекс этичности российских банков на основе искусственного интеллекта

Storchevoy M., Parshakov P., Paklina S. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2024 Т. 520 № 6 С. 70–81

Measuring a company's ethics is an important element in the mechanism of regulating the behavior of market participants, as it allows consumers and regulators to make better decisions, which has a disciplining effect on companies. We tested various methods of machine analysis of consumer feedback from Russian banks and developed an Ethics Index that allows ...

Added: October 31, 2024

The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned Layers

Sergei Kudriashov, Veronika Zykova, Stepanova A. et al., , in: Advances in Neural Computation, Machine Learning, and Cognitive Research VIII, Selected Papers from the XXVI International Conference on Neuroinformatics, October 21-25, 2024, Moscow, RussiaVol. VIII.: Cham: Springer, 2024. P. 13–22.

The interpretation of deep learning models is a rapidly growing field, with particular interest in language models. There are various approaches to this task, including training simpler models to replicate neural network predictions and analyzing the latent space of the model. The latter method allows us to not only identify patterns in the model’s decision-making process, but also understand ...

Added: October 24, 2024

A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

Vasilii A. Gromov, Nikita S. Borodin, Asel S. Yerbolova, Complexity 2024 Vol. 2024 No. 1 Article 8863360

Te present paper introduces a novel object of study, a language fractal structure; we hypothesize that a set of embeddings of all n-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all n). Te ...

Added: June 29, 2024

Axiomatic Foundation of Central Place Theory: Revision from the Standpoint of the Russian School

R. V. Dmitriev, Shuper V. A., Regional Research of Russia 2023 Vol. 13 No. 4 P. 751–757

The article refines the axiomatic foundation of central place theory (CPT) and identifies the possibilities and limitations of a logical transition in research from real settlement systems to central place (CP) systems. The necessity of relying on the CPT axioms in the following form is determined: (1) the space of a CP system is not ...

Added: February 16, 2024

Regional inflation analysis using social network data

Shcherbakov, V., Karpov I., Economy of Regions 2024 Vol. 20 No. 3 P. 930–946

Inflation is one of the most important macroeconomic indicators that have a great impact on the population of any country and region. Inflation is influenced by range of factors, one of which is inflation expectations. Many central banks all over the World take this factor into consideration while implementing monetary policy within the inflation targeting ...

Added: December 7, 2023

Grammar in Language Models: BERT Study

Chistyakova K., Kazakova Tatiana, / NRU HSE. Series WP BRP "Linguistics". 2023. No. 115.

The problem of language models’ interpretation is extensively inspected, but no universal answers have been found. Our study offers to combine widely accepted probing methods with a novel approach to a neural network under investigation. We propose to break grammatical forms on the pre-training step in order to get two "sibling" models, as it casts ...

Added: November 29, 2023

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Identifying and Visualizing Trends in Science, Technology, and Innovation Using SciBERT

Lobanova P., Bakhtin P., Sergienko Y., IEEE Transactions on Engineering Management 2024 No. 71 P. 11898–11906

Identification of science, technology, and innovation trends is a critical topic both for the scientific community and for companies that develop technologies, work on science and technology policy or invest in high tech. In this research authors demonstrate a novel approach implemented in iFORA system (developed by National Research University Higher School of Economics) using ...

Added: September 8, 2023

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Malik M. S., Imran T., Mona Mamdouh J., PeerJ Computer Science 2023 Vol. 9 Article e1248

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such ...

Added: September 4, 2023

Automated defect identification for cell phones using language context, linguistic and smoke-word models

Muhammad Z. Y., Malik M. S., Ignatov D. I., Expert Systems with Applications 2023 Vol. 227 Article 120236

Product defects are a widespread concern for manufacturers when conducting quality and customer relationship management. Prior approaches addressed many electronic products however cell phones are still unexplored. Moreover, prior work mainly focused on the lexicon, probabilistic graphic, failure mode, and effect analysis models but the utilization of word embeddings and language models are not explored. State-of-the-art contextual word embeddings and language models generate automated features and ...

Added: June 13, 2023

Computational Experiments on Detecting Meaning shift in Jokes

Eugeniia Zakovorotnaia, , in: 2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON).: Ekaterinburg: IEEE, 2022. P. 840–843.

The paper describes an experimental approach to detect the meaning shift, one of the most fundamental characteristics of humor, which is studied by many scientists in different interdisciplinary theoretical methodologies. We measured cosine similarity between setups and punchlines and explained these results through the set of objective criteria such as cosine results limitations, punchline length, ...

Added: May 10, 2023

Acceptability Judgements via Examining the Topology of Attention Maps

Cherniavskii D., Tulchinskii E., Mikhailov V. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2022.: Association for Computational Linguistics, 2022. Ch. 7 P. 88–107.

Added: February 17, 2023