J-means and I-means for minimum sum-of-squares clustering on networks

Nikolaev A.; Mladenovic N.; Todosijevic R.

doi:10.1007/s11590-015-0974-4

Publications

?

J-means and I-means for minimum sum-of-squares clustering on networks

Optimization Letters. 2017. Vol. 11. No. 2. P. 359–376.

Nikolaev A., Mladenovic N., Todosijevic R.

Given a graph, the Edge minimum sum-of-squares clustering problem requires finding p prototypes (cluster centres) by minimizing the sum of their squared distances from a set of vertices to their nearest prototype, where a prototype can be either a vertex or an inner point of an edge. In this paper we have implemented Variable neighborhood search based heuristic for solving it. We consider three different local search procedures, K-means, J-means, and a new I-means heuristic. Experimental results indicate that the implemented VNS-based heuristic produces the best known results in the literature.

Priority areas: IT and mathematics

Keywords: кластеризация Minimum sum-of-squares clustering K-means J-means Heuristic Variable neighborhood search Минимум суммы квадратов расстояний Метод k-средних Метод j-средних Эвристический алгоритм Метод поиска с переменными окрестностями

МОДИФИКАЦИЯ ЖАДНОГО АЛГОРИТМА КЛАСТЕРИЗАЦИИ

Баранов М. А., Прикладная информатика 2013 № 2 С. 78–88

В настоящее время разработано множество алгоритмов кластеризации, использующих различные подходы к решению задачи кластерного анализа. В данной статье предлагается модификация одного из алгоритмов кластеризации, принцип работы которого основан на так называемом жадном подходе. Суть модификации состоит в том, что решение о том, следует ли добавлять новый документ в кластер, принимается на основании его схожести с ...

Added: May 13, 2013

Big Data Clustering in Cardeology Based on Modeling of Electrical Dynamics of the Heart in the form of Fermi-Pasta-Ulam Auto-Recurrence as a New Tool for the Study of Cardiac Activity

Shmid A., Новопашин М. А., Березин А. А. et al., Clinical Cardiology and Cardiovascular Interventions 2018 No. 1-10004 P. 1–8

The mass application of mobile cardiographs already leads to both explosive quantitative growth of the number of patients available for ECG study, registered daily outside the hospital (Big DATA in cardiology), and to the emergence of new qualitative opportunities for the study of long-term oscillatory processes (weeks, months, years) of the dynamics of the individual ...

Added: November 15, 2018

Эволюционный процесс условного текстурирования сетевого пространства состояний в формате единого информационного ресурса сети

Tsukanova O. A., Maltseva S. V., Автоматизация и современные технологии 2013 № 11 С. 26–29

Сформулирован и обоснован концептуальный подход к созданию методики формирования управляемого информационного пространства состояний сетевого сообщества с помощью моделирования условно текстурированной ресурсной среды на основе итерации структуры сетевого сообщества, отображаемого в формате единого информационного ресурса. ...

Added: November 13, 2013

Однопроходный алгоритм трикластеризации

Гнатышак Д. В., Научно-техническая информация. Серия 2: Информационные процессы и системы 2015 № 2 С. 16–30

В связи с продолжающимся ростом популярности области больших данных все более активно ставится вопрос о создании эффективных алгоритмов с низкой временной сложностью и возможностью параллелизации. Целью данной работы было создание эффективного однопроходного алгоритма трикластеризации бинарных данных, пригодного для использования в области больших данных. В результате был получен однопроходный линейный онлайн-алгоритм OAC-трикластеризации (трикластеризации объект-признак-условие). Помимо того, ...

Added: April 15, 2015

Cloud technologies in the problems of mathematical analysis of cardiological information

Zimina E., Shmid A., Новопашин М. А., Data Science. Information Technology and Nanotechnology 2018, CEUR workshop proceedings 2018 No. 2212 P. 112–118

The article includes the observation of the cloud services and technologies usage. The article contains a review of mathematical analysis of cardiac information using cloud technology, which produces storage, analysis and forecasting on the basis of owned data. In addition, the authors consider the possibility of integrating cloud technologies with external systems. The massive use of mobile devices for ...

Added: August 27, 2019

Методика распознавания злокачественных новообразований в желудке человека на снимках компьютерной томографии с использованием алгоритмов обработки изображений

Байдин Г. С., Букреев Д. Д., Биомедицинская радиоэлектроника 2019 Т. 22 № 5 С. 5–14

Постановка проблемы. С ростом сложности диагностики злокачественных новообразований в желудке человека, хранения результатов исследований в цифровом формате и увеличения их объема возникает необходимость в автоматизации клинической диагностики данного заболевания, что может улучшить точность и достоверность ее результатов. Цель – разработка методики для автоматизированного распознавания злокачественных новообразований в желудке человека на снимках компьютерной томографии. Результаты. Разработана методика автоматизированного ...

Added: September 5, 2021

Двухфазная схема решения в рамках использования смесей алгоритмов в задаче «структура – свойство»

Прохоров Е. И., Свитанько И. В., Захаренко А. Л. et al., Pattern Recognition and Image Analysis 2016 Т. 26 № 1

Статья посвящена прогнозированию свойств химических соединений математическими методами распознавания образов. Исследование проведено на примере активности ингибиторов фермента деления клеток. В качестве методов построения распознающих моделей используется подход на базе смесей алгоритмов. В работе рассмотрена двухфазная схема решения задачи «структура – свойство», также описаны локальный классификатор на базе метода ближайших соседей и метод использующий множества кластеризаций. ...

Added: August 24, 2016

Clustering: A Data Recovery Approach

Mirkin B., L.: CRC Press, 2012.

One of the goals of the first edition of this book back in 2005 was to present a coherent theory for K-Means partitioning and Ward hierarchical clustering. This theory leads to effective data pre-processing options, clustering algorithms and interpretation aids, as well as to firm relations to other areas of data analysis. The goal of ...

Added: January 31, 2013

Распределенная кластеризация данных о поведении пользователей веб-сайта для рекомендательных систем

Новиков О. В., Образование. Наука. Научные кадры 2013 № 2-2013 С. 164–167

This article represents a new technique for collaborative filtering based on pre-clustering of website usage data. The key idea involves using clustering methods to define groups of different users. ...

Added: April 6, 2013

Тематические модели в задаче извлечения однословных терминов

М.А. Нокель, Н.В. Лукашевич, Программная инженерия 2014 № 3 С. 34–40

The paper describes the results of an experimental study of statistical topic models applied to the task of automatic single-word term extraction. The English part of the Europarl parallel corpus from the socio-political domain and the Russian articles taken from online banking magazines were used as target text collections. The experiments demonstrate that topic information ...

Added: October 1, 2014

Cyclic Anticipation Scheduling in Grid VOs with Stakeholders Preferences

Toporkov V., Yemelyanov D., Toporkova A. S. et al., Lecture Notes in Computer Science 2017 Vol. 10421 P. 372–383

In this work, a job-flow scheduling approach for Grid virtual organizations (VOs) is proposed and studied. Users’ and resource providers’ preferences, VOs internal policies, resources geographical distribution along with local private utilization impose specific requirements for efficient scheduling according to different, usually contradictive, criteria. With increasing resources utilization level the available resources set and corresponding ...

Added: March 13, 2018

Об обучении системы верификации диктора на неразмеченных данных

Ermilov A., Gostev I. M., Математическое моделирование 2015 Т. 27 № 7 С. 51–57

In the article we consider a method of labeling speaker data using clusterization techniques. Such problems arise when one needs to use speaker data from new channels, for example, mobile devices. These data might then be used to construct a speaker verification system. In the article described a speaker verification task along with some methods ...

Added: December 19, 2014

Динамика кластерных структур в сетях фондовых рынков

Kocheturov A. A., Batsyn M. V., Pardalos P. M., Журнал Новой экономической ассоциации 2015 Т. 4 № 28 С. 12–30

В течение последних пятнадцати лет сетевой анализ активно при- менялся как инструмент для исследования финансовых рынков. В настоящей работе представлен анализ фондовых рынков США и Швеции, основанный на сетевом подходе. В работе вычисляются и исследуются специальные кластер- ные структуры в сетях, построенных по корреляционным матрицам доходно- стей акций фондовых рынков. Кластерная структура сети выделяется с ...

Added: January 27, 2016

Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet

Savchenko A., PeerJ Computer Science 2019 Vol. 5:e197 P. 1–26

This paper is focused on the automatic extraction of persons and their attributes (gender, year of born) from album of photos and videos. A two-stage approach is proposed in which, firstly, the convolutional neural network simultaneously predicts age/gender from all photos and additionally extracts facial representations suitable for face identification. Here the MobileNet is modified ...

Added: June 12, 2019

Автоматизация подсчета количества частиц на наномасштабных изображениях электронного микроскопа

Байдин Г. С., Титов А. С., Биомедицинская радиоэлектроника 2020 Т. 23 № 5 С. 59–71

Постановка проблемы. С ростом сложности исследования химических соединений в различных средах и усложнением обработки результатов экспериментов возникает необходимость в автоматизации данного процесса для улучшения точ-ности и достоверности полученных результатов. Цель работы – разработка методики для автоматизированного подсчета количества частиц в веществе на изображениях электронного микроскопа. Результаты. Разработана методика автоматизированного подсчета количества частиц в веществе на изображениях электронного ...

Added: September 5, 2021

Analysis and interpretation of imaging mass spectrometry data by clustering mass-to-charge images according to their spatial similarity

Alexandrov T., Chernyavsky I., Becker M. et al., Analytical Chemistry 2013 Vol. 85 No. 23 P. 11189–11195

Imaging mass spectrometry (imaging MS) has emerged in the past decade as a label-free, spatially resolved, and multipurpose bioanalytical technique for direct analysis of biological samples from animal tissue, plant tissue, biofilms, and polymer films. Imaging MS has been successfully incorporated into many biomedical pipelines where it is usually applied in the so-called untargeted mode-capturing spatial localization of a multitude of ions ...

Added: November 18, 2013

Fair Scheduling in Grid VOs with Anticipation Heuristic

Toporkov V., Yemelyanov D., Anna Toporkova, Lecture Notes in Computer Science 2018 Vol. 10778 P. 145–155

In this work, a job-flow scheduling approach for Grid virtual organizations (VOs) is proposed and studied. Users’ and resource providers’ preferences, VOs internal policies along with local private utilization impose specific requirements for scheduling according to different, usually contradictive, criteria. We study the problem of a fair job batch scheduling with a relatively limited resources ...

Added: April 16, 2018

Кластерный анализ кардиологических данных

Зимина Е. Ю., Статистика и Экономика 2018 Т. 15 № 2 С. 30–37

The article includes the observation of the cluster analysis of medical data on the example of the cardiac data. One of the main effective and commonly used Data Mining methods that applied to the large amounts of information (for example, mathematical economics) are clustering methods: the search for signs of similarity between objects in the study of the subject area ...

Added: May 29, 2018

Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Sokolova Anastasiia, Kharchevnikova Angelina, Savchenko A., Lecture Notes in Computer Science 2018 Vol. 10716 P. 223–230

In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and ...

Added: October 24, 2017

Overlapping community detection in networks based on link partitioning and partitioning around medoids

Ponomarenko A., Pitsoulis L., Shamshetdinov M., Plos One 2021 Vol. 16 No. 8 Article e0255717

In this paper, we present a new method for detecting overlapping communities in net- works with a predefined number of clusters called LPAM (Link Partitioning Around Medoids). The overlapping communities in the graph are obtained by detecting the disjoint communities in the associated line graph employing link partitioning and parti- tioning around medoids which are ...

Added: December 9, 2020

Pre-experiments on Annotation of Russian Coreference Corpus

Toldova S., Azerkovich I., Гришина Ю. et al., / NRU HSE. Series WP BRP "Linguistics". 2015.

Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subjects to annotation (markables), marking their ...

Added: December 15, 2015

Database on the Bandgap of Inorganic Substances and Materials

Kiselyova N. N., Dudarev V.A., Korzhuev M. A., Inorganic Materials: Applied Research 2016 Vol. 7 No. 1 P. 34–39

A database (DB) on the bandgap of inorganic substances available via the Internet (http://bg.imetdb.ru) was developed for the information service of specialists in the sphere of inorganic chemistry and materials science. The DB is integrated with other information systems on the properties of inorganic substances and materials, which provides the search of a wide range ...

Added: February 23, 2016

О выборе программных средств когнитивной компьютерной визуализации

Baibikova T., Domoratsky E., Вестник Московского финансово-юридического университета 2017 № 1 С. 200–206

Some questions of scientific visualization are under consideration in this paper. This article also discusses the peculiarities of application of cognitive computer graphics, singles out a range of tasks of scientific visualization. The paper gives a brief overview of modern support tools for program visualization, tendencies of their development and their main characteristics. A module ...

Added: June 10, 2017

Measures of uncertainty in market network analysis

Kalyagin V.A., Koldanov A.P., Koldanov P.A. et al., Physica A: Statistical Mechanics and its Applications 2014 Vol. 413 No. 1 P. 59–70

A general approach to measure statistical uncertainty of different filtration techniques for market network analysis is proposed. Two measures of statistical uncertainty are introduced and discussed. One is based on conditional risk for multiple decision statistical procedures and another one is based on average fraction of errors. It is shown that for some important cases ...

Added: July 19, 2014