J-means and I-means for minimum sum-of-squares clustering on networks
Given a graph, the Edge minimum sum-of-squares clustering problem requires finding p prototypes (cluster centres) by minimizing the sum of their squared distances from a set of vertices to their nearest prototype, where a prototype can be either a vertex or an inner point of an edge. In this paper we have implemented Variable neighborhood search based heuristic for solving it. We consider three different local search procedures, K-means, J-means, and a new I-means heuristic. Experimental results indicate that the implemented VNS-based heuristic produces the best known results in the literature.
Companies’ objectives extend beyond mere profitability, to what is generally known as Corporate Social Responsibility (CSR). Empirical research effort of CSR is typically concentrated on a limited number of aspects. We focus on the whole set of CSR activities to identify any structure to that set. In this analysis, we take data from 1850 of the largest international companies via the conventional MSCI database and focus on four major dimensions of CSR: Environment, Social/ Stakeholder, Labor, and Governance. To identify any structure hidden in almost constant average values, we apply the popular technique of K-means clustering. When determining the number of clusters, which is especially difficult in the case at hand, we use an equivalent clustering criterion that is complementary to the squareerror K-means criterion. Our use of this complementary criterion aims at obtaining clusters that are both large and farthest away from the center. We derive from this a method of extracting anomalous clusters one-by-one with a follow-up removal of small clusters. This method has allowed us to discover a rather impressive process of change from predominantly uniform patterns of CSR activities along the four dimensions in 2007 to predominantly single-focus patterns of CSR activities in 2012. This change may reflect the dynamics of increasingly interweaving and structuring CSR activities into business processes that are likely to be extended into the future.
Approximation of the gravitational field of an irregular celestial body by the gravitational attraction field of four massive points is studied here in the framework of K-means method, known from the theory of pattern recognition. Using this approach, the simplified models for gravitational fields of asteroid (1620) Geographos and comet (67P) Churyumov-Gerasimenko are constructed. For asteroid (1620) Geographos, the proposed model is compared with the previously used one when this asteroid is represented by four cotangent spheres with co-planar centers.
One can expect that the life expectancy of people in a city or geographical region depends on health-care infrastructure in that city or region, as well as on investment devoted to it. In this paper we wanted to check the influence of healthcare supports of different kind on the life expectancy. Data are collected on all 85 geographical districts in Russia, covering 15-year period. The symbolic regression model is applied and solved by variable neighborhood programming, the recent promising automatic programming technique. In other words, the analytic function is searched to present relation between the life expectancy and a few selected health-care financial attributes. Some years are used as training set, and some as testing set. Interesting results are obtained and analyzed. They confirm the fact that symbolic regression and artificial intelligence techniques might be the right approach in estimating the life expectancy
One of the goals of the first edition of this book back in 2005 was to present a coherent theory for K-Means partitioning and Ward hierarchical clustering. This theory leads to effective data pre-processing options, clustering algorithms and interpretation aids, as well as to firm relations to other areas of data analysis. The goal of this second edition is to consolidate, strengthen and extend this island of understanding in the light of recent developments. Moreover, the material on validation and interpretation of clusters is updated with a system better reflecting the current state of the art and with our recent ``lifting in taxonomies'' approach. The structure of the book has been streamlined by adding two Chapters: ``Similarity Clustering'' and ``Validation and Interpretation'', while removing two chapters: ``Different Clustering Approaches'' and ``General Issues.'' The Chapter on Mathematics of the data recovery approach, in a much extended version, almost doubled in size, now concludes the book. Parts of the removed chapters are integrated within the new structure. The change has added a hundred pages and a couple of dozen examples to the text and, in fact, transformed it into a different species of a book. In the first edition, the book had a Russian doll structure, with a core and a couple of nested shells around. Now it is a linear structure presentation of the data recovery clustering.
The paper presents a tabu search heuristic for the Fleet Size and Mix Vehicle Routing Problem (FSMVRP) with hard and soft time windows. The objective function minimizes the sum of travel costs, fixed vehicle costs, and penalties for soft time window violations. The algorithm is based on the tabu search with several neighborhoods. The main contribution of the paper is the efficient algorithm for a real-life vehicle routing problem. To the best of our knowledge, there are no papers devoted to the FSMVRP problem with soft time windows, while in real-life problems, this is a usual case. We investigate the performance of the proposed heuristic on the classical Solomon instances with additional constraints. We also compare our approach without soft time windows and heterogeneous fleet of vehicles with the recently published results on the VRP problem with hard time windows.