Unsupervised Graph Anomaly Detection Algorithms Implemented in Apache Spark

A. Semenov; Mazeev A.; Dmitry D.; Timur Y.

doi:10.1134/S1995080218090184

Publications

?

Unsupervised Graph Anomaly Detection Algorithms Implemented in Apache Spark

Lobachevskii Journal of Mathematics. 2018. Vol. 39. No. 9. P. 1262–1269.

Semenov A., Mazeev A., Dmitry D., Timur Y.

The graph anomaly detection problem occurs in many application areas and can be solved by spotting outliers in unstructured collections of multi-dimensional data points, which can be obtained by graph analysis algorithms. We implement the algorithm for the small community analysis and the approximate LOF algorithm based on Locality-Sensitive Hashing, apply the algorithms to a real world graph and evaluate scalability of the algorithms. We use Apache Spark as one of the most popular Big Data frameworks.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

DOI

Text on another site

Keywords: Spark performance evaluation graph processing supervised anomaly detection

Conceptual Knowledge Structures First International Joint Conference, CONCEPTS 2024, Cádiz, Spain, September 9–13, 2024, Proceedings

Obiedkov S., Switzerland: Springer, 2024.

This book constitutes the proceedings of the First International Joint Conference on Conceptual Knowledge Structures, CONCEPTS 2024, which took place in Cádiz, Spain, during September 9-13, 2024. The conference is an amalgamation of the 18th International Conference on Formal Concept Analysis (ICFCA); the 17th International Conference on Concept Lattices and Their Applications (CLA); and the 28th ...

Added: January 23, 2026

Cooperative games with fuzzy characteristic functions on concept lattices

Kemgne M. W., Njionou B. B., Ignatov D. I. et al., International Journal of Approximate Reasoning 2025 Vol. 186 P. 1–18

This paper introduces cooperative games with transferable utilities and fuzzy characteristic functions on concept lattices. While previous works have independently addressed games with fuzzy payoffs and games restricted to structured coalition systems such as lattices, our approach combines both perspectives. We consider cooperative settings where coalition formation is constrained by a concept lattice structure, and ...

Added: January 23, 2026

Run time dynamic digital twins and dynamic digital twins networks

Vodyaho A., Delhibabu R., Ignatov D. I. et al., Future Generation Computer Systems 2025 Vol. 172 P. 1–18

Digital twins are widely used for building various types of cyber–physical systems. There are a huge number of publications devoted to the use of digital twins in production systems. Much less attention is paid to the issues of building runtime digital twins. The article describes an approach to building complex distributed cyber–physical systems with a ...

Added: January 23, 2026

LAMBO: Landmarks Augmentation With Manifold-Barycentric Oversampling

Bespalov Y., Buzun N., Kachan O. et al., IEEE Access 2022 No. 10 Article 3219934

We propose the first data augmentation method based on optimal transport theory, with the generated data being guaranteed to belong to the original data manifold. The proposed algorithm randomly samples a clique in the nearest-neighbors graph representing the data knowledge and computes the Wasserstein barycenter between the neighbours with random uniform weights. Being extremely natural- ...

Added: January 21, 2026

Blurred Magnitude Homology of Functional Connectome for ASD Diagnosis

Качура А. С., Chernyshev V. L., Kachan O. et al., Frontiers in Psychiatry 2026 No. 16 Article 1677282

Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders. Existing studies show that adults with ASD may experience accelerated or altered neurocognitive aging. Consequently, cognitive decline in people with ASD can be delayed if timely measures are taken to treat this disorder. This study focuses on the development of a new algorithm ...

Added: January 21, 2026

19th Annual Conference on Theory and Applications of Models of Computation, TAMC 2025

Springer, 2026.

This book constitutes the proceedings of the 19th Annual Conference on Theory and Applications of Models of Computation, TAMC 2025, which was held in Jinan, China, during September 19–21, 2025. ...

Added: January 20, 2026

11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers

Springer, 2026.

Added: January 20, 2026

Computer-aided system for assessing and selecting effective masters' learning trajectory in variability of external factors considering the university industrial partners' opinion

Vishnekov A., Ivanova E., Zhursunova N., Информатика и образование 2025 Vol. 40 No. 6 P. 39–48

The modern education system development is characterized by many uncertain, dynamically changing factors. The purpose and originality of the study presented in the article is to develop an automated system for building an effective educational trajectory in the conditions of external factors’ uncertainty. The developed system is a tool for assessment and dynamic adjustment of ...

Added: January 16, 2026

Многоаспектная оценка методов адаптации токенизатора для больших языковых моделей на русском языке

Andriushchenko G. D., Godunova M., Ivanov V. et al., Doklady Mathematics 2025 Vol. 527 P. 320–331

Large language models (LLMs) pretrained on English-centered corpora have biases and perform sub-optimally on other natural languages. Adaptation of LLMs vocabulary provides a resource-efficient way to improve the quality of a pretrained model. Previously proposed adaptation techniques focus on performance (accuracy) and size metrics (fertility), ignoring other aspects in comparison, such as inference latency, compute ...

Added: January 15, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

SynEL: A synthetic benchmark for entity linking

Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 1 No. 1 P. 1–18

Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...

Added: January 15, 2026

On syntactic concept lattice models for the Lambek calculus and infinitary action logic

Stepan L. Kuznetsov, Journal of Logic and Computation 2026 Vol. 36 No. 1 Article exaf078

The linguistic applications of the Lambek calculus suggest its semantics over algebras of formal languages. A straightforward approach to construct such semantics indeed yields a brilliant completeness theorem (Pentus 1995, Ann. Pure Appl. Logic, 75, 179–213). However, extending the calculus with extra operations ruins completeness. In order to mitigate this issue, Wurm (2017, J. Logic Lang. Inf., ...

Added: January 14, 2026

Method of a voice source acoustic analysis in real time

Savchenko V., Savchenko L., Measurement Techniques 2025 Vol. 68 P. 453–463

The problem of non-invasive analysis of the vocal function of the speech apparatus based on the speaker’s speech signal is addressed. A new method of acoustic analysis of a pulse-type voice source based on a two-stage measurement procedure has been developed. The first stage of measurements provides for filtering of the voice excitation signal of the vocal tract ...

Added: January 13, 2026

KDD '25: Proceedings of the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Volume 2

Association for Computing Machinery (ACM), 2025.

Added: January 12, 2026

Система синхронизации для устройств квантового распределения ключей

Рудавин Н. В., Ящук В. Ю., Феимов А. А. et al., Журнал технической физики 2026 № 2 С. 351–366

В коммерческих устройствах квантового распределения ключей (КРК) высокая точность синхронизации между генераторами опорных частот передатчика и приемника играет ключевую роль для обеспечения их функционирования. Предложена реализация системы коррекции разницы частот генераторов для устройства КРК. Подробно описаны оптическая схема системы синхронизации, двухступенчатый метод коррекции частот и помехоустойчивый метод автоматического определения момента старта приема и передачи квантовых ...

Added: January 12, 2026

Проектирование FPGA в Quartus

Romanova I., Amerikanov A., Romanov A., М.: ДМК Пресс, 2025.

Программируемые логические интегральные схемы (ПЛИС, FPGA) составляют основу современных цифровых систем и широко используются на всех этапах их внедрения – начиная от проведения экспериментальных исследований до разработки и создания конкретного оборудования. ПЛИС широко применяются в системах обработки аудио- и видеосигналов, системах обработки сетевого трафика, ускорителях вычислений и т. д. Поэтому подготовка специалистов, которые могут эффективно применять ПЛИС во всех ...

Added: January 8, 2026

Adaptation of Error Correction Procedures to the Time-Bin Quantum Key Distribution Protocol Implementation

Vladimir I. Morozov, Mikhail S. Elezov, Oleg O. Evsutin et al., IEEE Access 2026 Vol. 14 P. 343–354

Error correction is a crucial stage in quantum key distribution (QKD) protocols — a promising field of modern cryptography where the secrecy of the shared key information is guaranteed by the laws of quantum mechanics. Currently, there are many effective approaches to error correction in QKD. However, most of them, due to their generic nature, ...

Added: January 6, 2026

Low-rank matrix and tensor approximations for compression of machine-learning interatomic potentials

Vorotnikov I., Romashov F., Rybin N. et al., Journal of Chemical Physics 2025 Vol. 163 No. 24

Machine-learning interatomic potentials (MLIPs) have become a mainstay in computationally guided materials science, surpassing traditional force fields due to their flexible functional form and superior accuracy in reproducing physical properties of materials. This flexibility is achieved through mathematically rigorous basis sets that describe interatomic interactions within a local atomic environment. The number of parameters in ...

Added: January 4, 2026

Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction

Chikake T. M., Goldengorin B. I., Pardalos P. M., Computer Optics 2025 Vol. 49 No. 6 P. 1191–1201

We present a general-purpose, training-free framework for dimensionality reduction and clustering based on per–sample pseudo–Boolean polynomials (PBP). The method constructs compact, interpreTab. features without model fitting and is evaluated under a standardized protocol that compares PBP to PCA, t-SNE, and UMAP using identical inputs and metrics: clustering alignment (V-measure, Adjusted Rand Index), cluster geometry (Silhouette coefficient, ...

Added: January 2, 2026

ИТ-кризисология: методология для устойчивого развития сложных социотехнических систем

Zykov S. V., Информационно-измерительные и управляющие системы 2025 Т. 23 № 5 С. 110–118

В настоящее время растет важность задач, связанных с анализом моделей и методов, применимых для поддержки разработки сложных социотехнических систем в условиях кризиса. В фокусе исследования находится антикризисное управление разработкой таких систем с учетом обеспечения устойчивости процессов их разработки и последующего развития на основе комплексного учета структурно-поведенческих особенностей их построения. При этом для обеспечения возможности антикризисного ...

Added: December 30, 2025

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025

Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.

Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.

The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...

Added: November 9, 2025