Near-Duplicate Detection for Online-Shops Owners: An FCA-Based Approach

D. I. Ignatov; Chubis Y.; A. V. Konstantinov

?

Near-Duplicate Detection for Online-Shops Owners: An FCA-Based Approach

Lecture Notes in Computer Science. 2013. Vol. 7814. P. 722–725.

Ignatov D. I., Chubis Y., Konstantinov A. V.

We proposed a prototype of near-duplicate detection system for web-shop owners. It’s a typical situation for this online businesses to buy description of their goods from so-called copyrighters. Copyrighter can cheat from time to time and provide the owner with some almost identical descriptions for different items. In this paper we demonstrated how we can use FCA for fast clustering and revealing such duplicates in real online perfume shop’s datasets.

Research target: Computer Science

Priority areas: IT and mathematics business informatics

Language: English

Full text

Keywords: e-commerce электронная коммерция анализ формальных понятий Formal Concept Analysis FCA (Formal Concept Analysis)Clustering Near Duplicate Near Duplicate Detection поиску документов-дубликатов

Computer-aided system for assessing and selecting effective masters' learning trajectory in variability of external factors considering the university industrial partners' opinion

Vishnekov A., Ivanova E., Zhursunova N., Информатика и образование 2025 Vol. 40 No. 6 P. 39–48

The modern education system development is characterized by many uncertain, dynamically changing factors. The purpose and originality of the study presented in the article is to develop an automated system for building an effective educational trajectory in the conditions of external factors’ uncertainty. The developed system is a tool for assessment and dynamic adjustment of ...

Added: January 16, 2026

Многоаспектная оценка методов адаптации токенизатора для больших языковых моделей на русском языке

Andriushchenko G. D., Godunova M., Ivanov V. et al., Doklady Mathematics 2025 Vol. 527 P. 320–331

Large language models (LLMs) pretrained on English-centered corpora have biases and perform sub-optimally on other natural languages. Adaptation of LLMs vocabulary provides a resource-efficient way to improve the quality of a pretrained model. Previously proposed adaptation techniques focus on performance (accuracy) and size metrics (fertility), ignoring other aspects in comparison, such as inference latency, compute ...

Added: January 15, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

SynEL: A synthetic benchmark for entity linking

Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 1 No. 1 P. 1–18

Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...

Added: January 15, 2026

On syntactic concept lattice models for the Lambek calculus and infinitary action logic

Stepan L. Kuznetsov, Journal of Logic and Computation 2026 Vol. 36 No. 1 Article exaf078

The linguistic applications of the Lambek calculus suggest its semantics over algebras of formal languages. A straightforward approach to construct such semantics indeed yields a brilliant completeness theorem (Pentus 1995, Ann. Pure Appl. Logic, 75, 179–213). However, extending the calculus with extra operations ruins completeness. In order to mitigate this issue, Wurm (2017, J. Logic Lang. Inf., ...

Added: January 14, 2026

Method of a voice source acoustic analysis in real time

Savchenko V., Savchenko L., Measurement Techniques 2025 Vol. 68 P. 453–463

The problem of non-invasive analysis of the vocal function of the speech apparatus based on the speaker’s speech signal is addressed. A new method of acoustic analysis of a pulse-type voice source based on a two-stage measurement procedure has been developed. The first stage of measurements provides for filtering of the voice excitation signal of the vocal tract ...

Added: January 13, 2026

KDD '25: Proceedings of the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Volume 2

Association for Computing Machinery (ACM), 2025.

Added: January 12, 2026

Система синхронизации для устройств квантового распределения ключей

Рудавин Н. В., Ящук В. Ю., Феимов А. А. et al., Журнал технической физики 2026 № 2 С. 351–366

В коммерческих устройствах квантового распределения ключей (КРК) высокая точность синхронизации между генераторами опорных частот передатчика и приемника играет ключевую роль для обеспечения их функционирования. Предложена реализация системы коррекции разницы частот генераторов для устройства КРК. Подробно описаны оптическая схема системы синхронизации, двухступенчатый метод коррекции частот и помехоустойчивый метод автоматического определения момента старта приема и передачи квантовых ...

Added: January 12, 2026

Проектирование FPGA в Quartus

Romanova I., Amerikanov A., Romanov A., М.: ДМК Пресс, 2025.

Программируемые логические интегральные схемы (ПЛИС, FPGA) составляют основу современных цифровых систем и широко используются на всех этапах их внедрения – начиная от проведения экспериментальных исследований до разработки и создания конкретного оборудования. ПЛИС широко применяются в системах обработки аудио- и видеосигналов, системах обработки сетевого трафика, ускорителях вычислений и т. д. Поэтому подготовка специалистов, которые могут эффективно применять ПЛИС во всех ...

Added: January 8, 2026

Adaptation of Error Correction Procedures to the Time-Bin Quantum Key Distribution Protocol Implementation

Vladimir I. Morozov, Mikhail S. Elezov, Oleg O. Evsutin et al., IEEE Access 2026 Vol. 14 P. 343–354

Error correction is a crucial stage in quantum key distribution (QKD) protocols — a promising field of modern cryptography where the secrecy of the shared key information is guaranteed by the laws of quantum mechanics. Currently, there are many effective approaches to error correction in QKD. However, most of them, due to their generic nature, ...

Added: January 6, 2026

Low-rank matrix and tensor approximations for compression of machine-learning interatomic potentials

Vorotnikov I., Romashov F., Rybin N. et al., Journal of Chemical Physics 2025 Vol. 163 No. 24

Machine-learning interatomic potentials (MLIPs) have become a mainstay in computationally guided materials science, surpassing traditional force fields due to their flexible functional form and superior accuracy in reproducing physical properties of materials. This flexibility is achieved through mathematically rigorous basis sets that describe interatomic interactions within a local atomic environment. The number of parameters in ...

Added: January 4, 2026

Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction

Chikake T. M., Goldengorin B. I., Pardalos P. M., Computer Optics 2025 Vol. 49 No. 6 P. 1191–1201

We present a general-purpose, training-free framework for dimensionality reduction and clustering based on per–sample pseudo–Boolean polynomials (PBP). The method constructs compact, interpreTab. features without model fitting and is evaluated under a standardized protocol that compares PBP to PCA, t-SNE, and UMAP using identical inputs and metrics: clustering alignment (V-measure, Adjusted Rand Index), cluster geometry (Silhouette coefficient, ...

Added: January 2, 2026

ИТ-кризисология: методология для устойчивого развития сложных социотехнических систем

Zykov S. V., Информационно-измерительные и управляющие системы 2025 № 5 С. 110–118

В настоящее время растет важность задач, связанных с анализом моделей и методов, применимых для поддержки разработки сложных социотехнических систем в условиях кризиса. В фокусе исследования находится антикризисное управление разработкой таких систем с учетом обеспечения устойчивости процессов их разработки и последующего развития на основе комплексного учета структурно-поведенческих особенностей их построения. При этом для обеспечения возможности антикризисного ...

Added: December 30, 2025

Повышение эффективности потоковой обработки данных в интеллектуальной образовательной системе

Zykov S. V., Ермаков С. Р., Информационно-измерительные и управляющие системы 2025 № 5 С. 41–54

В условиях развития интеллектуальных обучающих и образовательных систем в них возникает необходимость обработки потоковых данных с соблюдением строгих ограничений по качеству обслуживания, таких как потребление памяти, точность прогнозирования, время отклика в условиях динамичного изменения данных и других. Проблематика заключается в том, что традиционные методы обработки данных зачастую не позволяют обеспечить требуемое качество при ограниченных ресурсах. Цель. Разработать ...

Added: December 30, 2025

29th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2025)

Elsevier, 2025.

-- ...

Added: December 30, 2025

IT Crisisology Patterns and Practices: Smart Agility for Digital Future

Zykov S. V., Springer, 2025.

This book focusses on real-world practitioner’s guidance in crisis management of digital product development. This includes monitoring, predicting, preventing and agile responding to critical situations by systematically applying resilient patterns and practices. This book introduces a thoroughly integrated toolbox of patterns and practices for sustainable crisis management, each individual component of which was carefully selected ...

Added: December 30, 2025

Community detection on simplicial complexes

Ермолаев Е. С., Applied Network Science 2025 Vol. 10 Article 30

Recent advances in complex systems have highlighted the utility of simplicial complexes for modeling higher-order interactions, particularly in biological and physical networks. This study presents enhanced Simplex2Vec, an adaptation of the Simplex2Vec algorithm, to facilitate community detection within such structures. We compare enhanced Simplex2Vec’s efficacy against the Leiden algorithm and Spectral clustering using 7 distinct ...

Added: December 30, 2025

Parallel Processing and Applied Mathematics. 15th International Conference, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I

Springer, 2025.

This book constitutes the refereed proceedings of the 15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, held in Ostrava, Czech Republic, during September 8–11, 2024. The 75 full papers included in this book were carefully reviewed and selected from 134 submissions. The papers are organized in the following topical sections: Part I : Numerical ...

Added: December 26, 2025

Generating and Debugging Java Code using LLMs based on Associative Recurrent Memory

Василевский В. И., Alexandrov D., Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 173–182

Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. ...

Added: December 26, 2025

Разработка и интеграция AI-ассистента в систему управления обучением.

Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190

The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...

Added: December 25, 2025

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Recovery degree constrained equiconcept/pseudo-equiconcept reduction in symmetric formal contexts

Junyu B., Fei H., Huilin F. et al., International Journal of Approximate Reasoning 2025 Vol. 187 Article 109541

In Formal Concept Analysis (FCA), concept reduction serves as an important means of simplification. The application scenarios of concept reduction cover various aspects such as data mining, knowledge discovery, strategic decision-making, and rule learning. For symmetric formal contexts, a specialized class of concept reduction exists that can fully recover all knowledge. However, most existing concept ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025