Frequent Itemset Mining for Clustering Near Duplicate Web Documents

D. I. Ignatov; S. Kuznetsov

?

Frequent Itemset Mining for Clustering Near Duplicate Web Documents

Lecture Notes in Artificial Intelligence. 2009. Vol. 5662. P. 185–200.

A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.

Multimodal graph, surface, and language-based model for protein protein interaction prediction

Arteaga Moreano B. D., Poptsova M., Scientific Reports 2026 No. 16 Article 4772

Accurate prediction of protein-protein interactions (PPIs) is fundamental to understanding biological processes and disease mechanisms. While deep learning offers a powerful alternative to costly experimental methods, existing approaches often overlook critical protein-surface information and rely on simplistic feature fusion techniques, thereby limiting performance. To address this, we introduce GSMFormer-PPI, a novel multimodal framework that integrates ...

Added: February 4, 2026

Алгоритмическая сложность теорий с итерацией Клини

Kuznetsov S., Успехи математических наук 2026 Т. 81 № 1 С. 137–204

Итерация (звёздочка) Клини – это одна из наиболее интересных алгебраических операций, встречающихся в теоретической информатике. Исследования структур с этой операцией – алгебр Клини и их расширений – начинаются с классического понятия регулярных выражений, задающих формальные языки. Впоследствии были введены так называемые алгебры действий (В. Пратт, 1991 г.; Д. Козен, 1994 г.), или алгебры Клини с делениями. В этих структурах звёздочка Клини сочетается с делениями, согласованными с частичным порядком (такие ...

Added: February 4, 2026

SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval

Ananyeva M., Liakhnovich K., Lashinin O. et al., Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 2025 P. 2754–2758

Relevance and diversity are critical objectives in modern information retrieval (IR), particularly in recommender systems. Achieving a balance between relevance (exploitation) and diversity (exploration) optimizes user satisfaction and business goals such as catalog coverage and novelty. While existing post-processing reranking methods address this trade-off, they usually rely on greedy strategies, leading to suboptimal outcomes for ...

Added: February 3, 2026

Natural Language Processing and Information Systems : 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4-6, 2025 : proceedings. Part I

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework

Tutubalina E., Храбров К., Ганеева В. et al., Journal of Cheminformatics 2025 No. 17 Article 164

The recent integration of natural language processing into chemistry has advanced drug discovery. Molecule representations in language models (LMs) are crucial to enhance chemical understanding. We explored the ability of models to match the same chemical structures despite their different representations. Recognizing the same substance in different representations is an important component of emulating the ...

Added: February 3, 2026

A Clustering Model for Stocks that Considers Hidden Dynamics and Price Trajectory

Sizykh N., Sizykh D., Morychev G., IEEE Access 2025 Vol. 13 P. 213194–213210

One of the main tools for analyzing large volumes of financial data is the use of clustering methods and models, which allow the identification of various patterns. This study examines the problem of clustering time series that reflect the behavior of prices, yields, modes, trends, and a number of related stock indicators. The relevance and ...

Added: February 3, 2026

Предельные теоремы для случайных многогранников, порожденных распределениями с тяжелыми хвостами

Simarova E., Запорожец Д. Н., Записки научных семинаров ПОМИ РАН 2025 Т. 544 С. 130–153

Работа посвящена изучению асимптотических свойств случайных многогранников, порожденных выпуклыми оболочками независимых одинаково распределенных случайных векторов с правильно меняющимся распределением (с тяжелым хвостом). Исследуется сходимость функционалов данных случайных многогранников, включая внутренние объемы, порожденные ими U-max статистики и f-вектор, к соответствующим функционалам пуассоновских многогранников. Полученные результаты обобщают известные факты для отдельных распределений на общий класс распределений с ...

Added: February 2, 2026

Solution to Hart–van Mill’s problem 61

Polyakov N. L., Saveliev D. I., Russian Mathematical Surveys 2026 Vol. 81 No. 1 P. 205–206

We solve Problem 61 from Hart and van Mill’s list on whether every finite partial order is embeddable in the Rudin–Keisler order on (types of) ultrafilters over $\omega$. ...

Added: January 31, 2026

Метод преобразования речевого сигнала для улучшения разборчивости речи

Savchenko L., Савченко В. В., Радиотехника и электроника 2025 Т. 70 № 8 С. 753–760

The problem of improving speech intelligibility in voice communication systems is considered. The acute issue of speaker recognition when applying known methods for solving this problem is highlighted. To overcome the specified problem, a new method for transforming the speech signal based on an autoregressive model of the vocal tract and the principle of frequency-selective ...

Added: January 29, 2026

Specification Tests for Jump-Diffusion Models Based on the Characteristic Function

Belomestny D., Grobler G. L., Meintanis S. G. et al., International Statistical Review 2026 P. 1–31

Goodness-of-fit tests are suggested for several popular jump-diffusion processes. The suggested test statistics utilise the marginal characteristic function of the model and its L2-type discrepancy from an empirical counterpart. Model parameters are estimated either by minimising the aforementioned L2-type discrepancy or by maximum likelihood. A hybrid estimation method that uses moment estimation is also proposed ...

Added: January 29, 2026

An Analysis of Sequential Patterns in Datasets for Evaluation of Sequential Recommendations

Klenitskiy A., Володкевич А. А., Pembek A. et al., ACM Transactions on Recommender Systems 2026

Sequential recommender systems are an important and in-demand area of research. These systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure ...

Added: January 28, 2026

Sub-Riemannian geodesics on the Heisenberg 3D nil-manifold.

Glutsyuk A., Sachkov Y., Nonlinearity 2025 Vol. 38 Article 115013

We study the projection of the left-invariant sub-Riemannian structure on the 3D Heisenberg group G to the Heisenberg 3D nil-manifold M — the compact homogeneous space of G by the discrete Heisenberg group. First we describe dynamical properties of the geodesic flow for M: periodic and dense orbits, a dynamical characterization of the normal Hamiltonian ...

Added: January 27, 2026

Autoregressive generation strategies for Top-K sequential recommendations

Klenitskiy A., Гусак Д. И., Володкевич А. А. et al., User Modelling and User-Adapted Interaction 2025 No. 35 Article 13

The goal of modern sequential recommender systems is often formulated in terms of next-item prediction. In this paper, we explore the applicability of transformer-based generative models for the Top-K sequential recommendation task, where the goal is to predict items that a user is likely to interact with in the “near future.” This goal aligns with ...

Added: January 26, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

On finding formal power-logarithmic expansions of solutions to q-difference equations

Gaianov N., Parusnikova A., / Cornell University. Серия math "arxiv.org". 2025.

An algebraic q-difference equation is considered. A sufficient condition for the existence of a formal power-logarithmic expansion of a solution to such an equation in the neighborhood of zero is proposed. An example of applying this sufficient condition for constructing a formal expansion of a solution to a certain q-difference analogue of the fifth Painlevé equation ...

Added: December 25, 2025

Flexible Stock Market Algorithm

Rubchinskiy A., Chubarova D., Technology and Investment 2025 Vol. 16 No. 4 P. 211–240

The article considers one of the most famous examples of socio-economic systems characterized by significant uncertainty—the S&P-500 stock market, where shares of 500 largest US companies are traded. The flexible algorithm for daily trading has been developed. It is based on known fixed data about cost of shares in previous days as well as on ...

Added: December 19, 2025

Ideal of the variety of flexes of plane cubics

Popov V., / Series arXiv "math". 2025. No. 2502.01539.

We prove that the variety of flexes of algebraic curves of degree 3 in the projective plane is an ideal theoretic complete intersection in the product of a two-dimensional and a nine-dimensional projective spaces. ...

Added: December 16, 2025

Random walks on rank one symmetric spaces of noncompact type

Gnetov F., Konakov V., / Series arXiv "math". 2025. No. 2512.04667.

We establish a central limit theorem, a local limit theorem, and a law of large numbers for a natural random walk on a symmetric space M of non-compact type and rank one. This class of spaces, which includes the complex and quaternionic hyperbolic spaces and the Cayley hyperbolic plane, generalizes the real hyperbolic space Hn. Our approach introduces ...

Added: December 5, 2025

Cascades of Lorenz attractors in the Shimizu-Morioka model

Kazakov A., Koryakin V., Safonov K. et al., / Series arXiv "math". 2025.

The Lorenz attractor is the first example of a robustly chaotic non-hyperbolic attractor. Each orbit of such an attractor has a positive top Lyapunov exponent, and this property persists under small perturbations despite possible bifurcations of the attractor. In this paper, we study the boundary of the Lorenz attractor existence region in the Shimizu-Morioka model. ...

Added: December 4, 2025

Асимптотический вариант метода параметрикс для цепей Маркова, сходящихся к диффузиям

Bitter I., Konakov V., / Cornell University. Серия arXiv "math". 2025. № 2505.24548.

В работе приводится обобщение локальной предельной теоремы о сходимости неоднородных цепей Маркова к диффузионному пределу на случай, когда соответ- ствующие коэффициенты процессов удовлетворяют слабым условиям регулярности и совпадают лишь асимптотически. В частности, рассматриваемые нами коэффици- енты сноса могут быть неограниченными с не более чем линейным ростом, а оценки отражают перенос терминального состояния неограниченным трендом через ...

Added: December 3, 2025

Stabilization of direct images for curves

Bogomolov F. A., Schrandt S., / Series arXiv "math". 2025.

We discuss phenomena of stabilization for direct images of line bundles over projective curves mapping onto the projective line, for maps of sufficiently big degree. ...

Added: December 1, 2025

Upper Bounds on the Torsion Index of Half-Spin Groups

Deviatov R., Baek S., / Series arXiv "math". 2025.

The torsion index of split simple groups has been extensively studied, notably by Totaro, who calculated the torsion indexes of the spin groups and $E_{8}$ in [5] and [6], respectively. The aim of this paper is to provide upper bounds for the torsion index of half-spin groups, the only remaining case in the calculation of ...

Added: December 1, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Birational transformations of threefold Q-conic bundles

Prokhorov Y., / Series arXiv "math". 2025.

A $\mathbf{Q}$-conic bundle is a contraction $f: X\to Z$ of a three-dimensional algebraic variety $X$ to a surface~$Z$ such that the variety~$X$ has only terminal $\mathbf{Q}$-factorial singularities, the anticanonical divisor $-K_X$ is~$f$-ample, and $\uprho(X/Z)=1$. We provide an algorithm to transform a $\mathbf{Q}$-conic bundle to its standard form. ...

Added: December 1, 2025