• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures

.
Gosgens M., Tikhonov A., Prokhorenkova L.

Many cluster similarity indices are used to evaluate clustering algorithms, and choosing the best one for a particular task remains an open problem. We demonstrate that this problem is crucial: there are many disagreements among the indices, these disagreements do affect which algorithms are preferred in applications, and this can lead to degraded performance in real-world systems. We propose a theoretical framework to tackle this problem: we develop a list of desirable properties and conduct an extensive theoretical analysis to verify which indices satisfy them. This allows for making an informed choice: given a particular application, one can first select properties that are desirable for the task and then identify indices satisfying these. Our work unifies and considerably extends existing attempts at analyzing cluster similarity indices: we introduce new properties, formalize existing ones, and mathematically prove or disprove each property for an extensive list of validation indices. This broader and more rigorous approach leads to recommendations that considerably differ from how validation indices are currently being chosen by practitioners. Some of the most popular indices are even shown to be dominated by previously overlooked ones.

Language: English
Keywords: clustering

In book

Proceedings of the 38th International Conference on Machine Learning (ICML 2021)
Vol. 139. , PMLR, 2021.
Similar publications
Flexible Stock Market Algorithm
Rubchinskiy A., Chubarova D., Technology and Investment 2025 Vol. 16 No. 4 P. 211–240
The article considers one of the most famous examples of socio-economic systems characterized by significant uncertainty—the S&P-500 stock market, where shares of 500 largest US companies are traded. The flexible algorithm for daily trading has been developed. It is based on known fixed data about cost of shares in previous days as well as on ...
Added: December 19, 2025
Tunnel Clustering Method
F. T. Aleskerov, A. L. Myachin, V. I. Yakuba, Doklady Mathematics 2024 Vol. 110 No. 3 P. 474–479
We propose a novel method for rapid pattern analysis of high-dimensional numerical data, termed tunnel clustering. The main advantages of the method are its relatively low computational complexity, endogenous determination of cluster composition and number, and a high degree of interpretability of final results. We present descriptions of three different variations: one with fixed hyperparameters, ...
Added: March 3, 2025
Использование Z-чисел для описания набора данных
Гусейнов О., Degtyarev K. Y., IRETC MTÜ PAHTEI - Proceedings of Azerbaijan High Technical Educational Institutions 2025 Т. 48 № 1 С. 360–370
The concept of Z-number was proposed by Prof. Lotfi Zadeh to describe partial reliability of information, and it is a kind of fusion of fuzziness and probabilistic uncertainty. Z-number can be presented as a pair of fuzzy numbers Z(A,B) used to describe a value of a random variable X. The first component (A) is a ...
Added: February 20, 2025
Gradient descent clustering with regularization to recover communities in transformed attributed networks
Shalileh S., Social Network Analysis and Mining 2025 Vol. 15212 P. 137–148
Community detection in attributed networks aims to recover clusters in which the within-community nodes are as interconnected and as homogeneous as possible, while the between-communities nodes are as disconnected and as heterogeneous as possible. The current research proposes a straightforward data-driven model with an integrated regularization term to recover communities. For further improvement of the ...
Added: November 30, 2024
An empirical scrutinization of four crisp clustering methods with four distance metrics and one straightforward interpretation rule
T. A. Alvandyan, S. Shalileh, Doklady Mathematics 2024 Vol. 110 No. S1 P. S236–S250
Clustering has always been in great demand by scientific and industrial communities.  However, due to the lack of ground truth, interpreting its obtained results can be debatable. The current research provides an empirical benchmark on the efficiency of three popular and one recently proposed crisp clustering methods. To this end, we extensively analyzed these (four) ...
Added: November 30, 2024
Моделирование оплаты труда учителей в условиях неоднородности социально-экономического состояния регионов
Богданова Т. К., Жукова Л. В., В кн.: XI-я международная конференция «Многомерный статистический анализ, эконометрика и моделирование реальных процессов» имени С.А. Айвазяна.: М.: ЦЭМИ РАН, 2024. С. 41–44.
The paper is devoted to the analysis and forecasting of the average salary of teachers. For 84 regions on the basis of their socio-demographic characteristics according to Rosstat data using Ward's method we obtained a two-cluster solution, which allowed us to identify quite strong differences in the level of wages, GRP per capita, level of ...
Added: October 4, 2024
Clustering with empty clusters
Penikas H. I., Феста Ю. Ю., Известия Дальневосточного федерального университета. Экономика и управление 2024 Vol. 2 P. 75–94
Кластерный анализ широко используется в различных научных и практических областях, связанных с анализом данных. Это важный инструмент для решения задач в таких областях, как машинное обучение, обработка изображений, распознавание текста и т.д. Отсутствие наблюдений не всегда означает отсутствие информации, поэтому предполагается, что наличие пробелов в данных, наличие“пустых” кластеров, также несёт в себе информацию об объекте исследования, как и реальные наблюдения. В этом исследовании предполагается, ...
Added: August 10, 2024
Detecting linguistic variation with geographic sampling
Koile E., Moroz G., Journal of Linguistic Geography 2024 Vol. 12 No. 1 P. 24–31
Geolectal variation is often present in settings where one language is spoken across a vast geographic area. This can be found in phonological, morphosyntactic, and lexical features. For practical reasons, it is not always possible to conduct fieldwork in every single location of interest in order to obtain the full pattern of variation, and a ...
Added: May 6, 2024
Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques
Gromov V., Dang Q. N., , in: 10th International Conference, PReMI 2023, Kolkata, India, December 12–15, 2023, Proceedings. Pattern Recognition and Machine Intelligence. LNCS, volume 14301.: Cham: Springer, 2023. Ch. 3 P. 20–27.
Added: November 29, 2023
Temperature-driven transition into vortex clusters in low-kappa intertype superconductors
Backs A., Al-Falou A., Vagov A. et al., Physical Review B: Condensed Matter and Materials Physics 2023 Vol. 107 No. 17 Article 174527
In the vicinity of the type-I/type-II crossover in conventional superconductors, vortices exhibit a nonmonotonic interaction, which leads to exotic vortex matter states. We perform molecular dynamics simulations on a model superconductor in the intertype regime. In a field cooled approach, we examine the transition of a homogeneous vortex lattice (VL) into a structure consisting of ...
Added: November 2, 2023
2023 Fifth International Conference Neurotechnologies and Neurointerfaces (CNN) 18-20 Sept. 2023
Alshanskaia E., Martynova O., IEEE, 2023.
Cognitive and emotional load in the course of increasing the complexity of tasks leads to the activation of various parts of the autonomic nervous system (ANS) and can be accompanied by an increase in the efficiency of problem solving. An increase in cognitive load under the condition of high motivation is a stress factor and ...
Added: September 24, 2023
Новая программная платформа для моделирования транспортных потоков с участием беспилотных автомобилей
Beklaryan A., Вестник ЦЭМИ 2023 Т. 6 № 1 Статья 5
The article presents a new software platform for modelling traffic flows involving unmanned vehicles, using a number of advanced technological solutions, in particular, the FLAME GPU supercomputer agent modelling framework, intelligent software modules based on fuzzy and hierarchical clustering, genetic optimization algorithms, a subsystem for visualizing the state of agents-vehicles based on OpenGL, etc. As ...
Added: June 4, 2023
Tracing Vortex Clustering in a Superconductor by the Magnetic Flux Distribution
A. Vagov, E. G. Nikonov, The Journal of Physical Chemistry Letters 2023 Vol. 14 No. 15 P. 3743–3748
By investigating spatial configurations of the intermediate mixed state in an intertype superconductor, it is shown that vortex clustering can be characterized by the sample averaged distribution of the penetrating magnetic field. The clustering is manifested in the two peak structure of the distribution. The second peak indicates a spot a material occupies in the ...
Added: June 2, 2023
An empirical comparison of connectivity-based distances on a graph and their computational scalability
Miasnikof P., Shestopaloff A., Pitsoulis L. et al., Journal of Complex Networks 2022 Vol. 10 No. 1 Article cnac003
In this study, we compare distance measures with respect to their ability to capture vertex community structure and the scalability of their computation. Our goal is to find a distance measure which can be used in an aggregate pairwise minimization clustering scheme. The minimization should lead to subsets of vertices with high induced subgraph density. ...
Added: November 21, 2022
Кластеризация шумов как способ оценки функции постоянного сосудистого доступа у больных на гемодиализе
Кравцов П. Ф., Николаев Е. Н., Мазайшвили К. В. et al., Вестник СурГУ. Медицина 2022 Т. 51 № 1 С. 25–30
Abstract. The study aims to develop an algorithm for assessing spectrographic features of arteriovenous fistula dysfunction for hemodialysis. Materials and methods. Forty-four patients with native radiocephalic fistula formed in the distal third of the forearm participated in the research. Using electronic stethoscope, the noise of arteriovenous fistula was recorded in all patients. 653 spectrograms were analyzed with the ...
Added: November 14, 2022
Различение хаотических и регулярных временных рядов для идентификации состояния артериовенозной фистулы
Gromov V., Мазайшвили К. В., Заикин П. В. et al., Вестник кибернетики 2022 Т. 45 № 1 С. 72–82
The prevalence of chronic kidney disease is growing every year and is already comparable to such socially significant diseases as hypertension and diabetes mellitus, as well as obesity and metabolic syndrome [1,2]. The standard solution for hemodialysis patients is to create a permanent vascular access in the form of an arteriovenous fistula. However, its use ...
Added: November 14, 2022
Направления поддержки малых предприятий промышленного сектора экономики: региональный аспект
Arkhipova M., Cherviakova A. A., Друкеровский вестник 2022 № 2 С. 86–109
Innovation activity of small industrial enterprises substantially varies across Russian regions that makes relevant the search of distinctive features of innovative entrepreneurship in Russian regions in order to develop targeted support for small innovative enterprises and to create new job places in digital economy. Multidimensional clustering of Russian regions by the indicators of small industrial ...
Added: October 24, 2022
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit