### Book

## Models, Algorithms, and Technologies for Network Analysis

This volume contains two types of papers—a selection of contributions from the “Second International Conference in Network Analysis” held in Nizhny Novgorod on May 7–9, 2012, and papers submitted to an "open call for papers" reflecting the activities of LATNA at the Higher School for Economics.

This volume contains many new results in modeling and powerful algorithmic solutions applied to problems in

- vehicle routing

- single machine scheduling

- modern financial markets

- cell formation in group technology

- brain activities of left- and right-handers

- speeding up algorithms for the maximum clique problem

- analysis and applications of different measures in clustering

The broad range of applications that can be described and analyzed by means of a network brings together researchers, practitioners, and other scientific communities from numerous fields such as Operations Research, Computer Science, Bioinformatics, Medicine, Transportation, Energy, Social Sciences, and more. The contributions not only come from different fields, but also cover a broad range of topics relevant to the theory and practice of network analysis. Researchers, students, and engineers from various disciplines will benefit from the state-of-the-art in models, algorithms, technologies, and techniques including new research directions and open questions.

The paper presents the analysis of the network model referred to as market graph of the BRIC countries stock markets. We construct the stock market graph as follows: each vertex represents a stock, and the vertices are adjacent if the price correlation coefficient between them over a certain period of time is greater than or equal to specified threshold. The market graphs are constructed for different time periods to understand the dynamics of their characteristics such as correlation distribution histogram, mean value and standard deviation, size and structure of the maximum cliques. Our results show that we can split the BRIC countries into two groups. Brazil, Russia and India constitute the first group, China constitutes the second group.

In this chapter, we introduce a new heuristic for Cell Formation Problem in its most general formulation with grouping efficiency as an objective function. Suggested approach applies an improvement procedure to obtain solutions with high grouping efficiency. This procedure is repeated until efficiency can be increased for randomly generated configurations of cells. We consider our preliminary results for 10 popular benchmark instances taken from the literature. Also source instances with the solutions we got can be found in the Appendix.

The preemptive single machine scheduling problem of minimizing the total weighted completion time with equal processing times and arbitrary release dates is one of the four single machine scheduling problems with an open computational complexity status. In this paper we present lower and upper bounds for the exact solution of this problem based on the assignment problem. We also investigate properties of these bounds and worst-case behavior.

In this chapter, we present our enhancements of one of the most efficient exact algorithms for the maximum clique problem—MCS algorithm by Tomita, Sutani, Higashi, Takahashi and Wakatsuki (in Proceedings of WALCOM’10, 2010, pp. 191–203). Our enhancements include: applying ILS heuristic by Andrade, Resende and Werneck (in Heuristics 18:525–547, 2012) to find a high-quality initial solution, fast detection of clique vertices in a set of candidates, better initial coloring, and avoiding dynamic memory allocation. A good initial solution considerably reduces the search tree size due to early pruning of branches related to small cliques. Fast detecting of clique vertices is based on coloring. Whenever a set of candidates contains a vertex adjacent to all candidates, we detect it immediately by its color and add it to the current clique avoiding unnecessary branching. Though dynamic memory allocation allows to minimize memory consumption of the program, it increases the total running time. Our computational experiments show that for dense graphs with a moderate number of vertices (like the majority of DIMACS graphs) it is more efficient to store vertices of a set of candidates and their colors on stack rather than in dynamic memory on all levels of recursion. Our algorithm solves p_hat1000-3 benchmark instance which cannot be solved by the original MCS algorithm. We got speedups of 7, 3000, and 13000 times for gen400_p0.9_55, gen400_p0.9_65, and gen400_p0.9_75 instances, correspondingly.

In this paper, we consider the asymmetric capacitated vehicle routing problem (ACVRP). We compare the search tree size and computational time for the bottleneck tolerance-based and cost-based branching rules within a branch-and-bound algorithm on the FTV benchmark instances. Our computational experiments show that the tolerance-based branching rule reduces the search tree size by 45 times and the CPU time by 2.8 times in average.

There exists much prejudice against the within-cluster summary similarity criterion which supposedly leads to collecting all the entities in one cluster. This is not so if the similarity matrix is pre-processed by subtraction of ``noise'', of which two ways, the uniform and modularity, are mentioned in the paper. Another criterion under consideration is the semi-average within-cluster similarity, which manifests more versatile properties. In fact, both types of criteria emerge in relation to the least-squares data approximation approach to clustering, as shown in the paper. A very simple local optimization algorithm, Add-and-Remove($S$), leads to a suboptimal cluster satisfying some tightness conditions. Three versions of an iterative extraction approach are considered, leading to a portrayal of the cluster structure of the data. Of these, probably most promising is what is referred to as the incjunctive clustering approach. Applications are considered to the analysis of semantics, to integrating different knowledge aspects and consensus clustering.

We consider the problem of planning the ISS cosmonaut training with different objectives. A pre-defined set of minimum qualification levels should be distributed between the crew members with minimum training time differences, training expenses or a maximum of the training level with a limitation of the budget. First, a description of the cosmonaut training process is given. The model are considered for the volume planning problem. The objective of the model is to minimize the differences between the total time of the preparation of all crew members. Then two models are considered for the timetabling planning problem. For the volume planning problem, two algorithms are presented. The first one is aheuristic with a complexity of O(n) operations. The second one consists of a heuristic and exact parts, and it is based on the npartition problem approach.

Methods of network analysis are used in this paper for mapping the local academic community of St. Petersburg sociologists. The survey data on relations between individual scholars serve as a guide in reconstruction of the communitys network history as well as a system of independent variables in accounting for differences between its various natural zones. In this manner, the paper explores the points of convergence between Chicago school social ecology and modern social network analysis.

Financial Decision Making Using Computational Intelligence covers all the recent developments in complex financial decision making through computational intelligence approaches. Computational intelligence has evolved rapidly in recent years and it is now one of the most active fields in operations research and computer science. The increasing complexity of financial problems and the enormous volume of financial data often make it difficult to apply traditional modeling and algorithmic procedures. In this context, the field of computational intelligence provides a wide range of useful techniques, including new modeling tools for decision making under risk and uncertainty, data mining techniques for analyzing complex data bases, and powerful algorithms for complex optimization problems.

The article introduces a historical-sociological research project reconstructing intellectual and institutional transformations of post-soviet social sciences in the last 25 years. The projects ambition was to achieve this aim via applying classical community study research strategy and various methods derived from social science history to the case of St. Petersburg sociologists. We identified 622 individuals as St. Petersburg sociologists and traced records of their institutional trajectories, appearance in print, citing behaviour, social networks, political attitudes, sources of income, professional authorities, and attention spaces through 25 years.

This volume contains a selection of contributions from the "First International Conference in Network Analysis," held at the University of Florida, Gainesville, on December 14-16, 2011. The remarkable diversity of fields that take advantage of Network Analysis makes the endeavor of gathering up-to-date material in a single compilation a useful, yet very difficult, task. The purpose of this volume is to overcome this difficulty by collecting the major results found by the participants and combining them in one easily accessible compilation.

Data Correcting Algorithms in Combinatorial Optimization focuses on algorithmic applications of the well known polynomially solvable special cases of computationally intractable problems. The purpose of this text is to design practically efficient algorithms for solving wide classes of combinatorial optimization problems. Researches, students and engineers will benefit from new bounds and branching rules in development efficient branch-and-bound type computational algorithms. This book examines applications for solving the Traveling Salesman Problem and its variations, Maximum Weight Independent Set Problem, Different Classes of Allocation and Cluster Analysis as well as some classes of Scheduling Problems. Data Correcting Algorithms in Combinatorial Optimization introduces the data correcting approach to algorithms which provide an answer to the following questions: how to construct a bound to the original intractable problem and find which element of the corrected instance one should branch such that the total size of search tree will be minimized. The PC time needed for solving intractable problems will be adjusted with the requirements for solving real world problems.

Many efficient exact branch and bound maximum clique solvers use approximate coloring to compute an upper bound on the clique number for every subproblem. This technique reasonably promises tight bounds on average, but never tighter than the chromatic number of the graph.

Li and Quan, 2010, AAAI Conference, p. 128–133 describe a way to compute even tighter bounds by reducing each colored subproblem to maximum satisfiability problem (MaxSAT). Moreover they show empirically that the new bounds obtained may be lower than the chromatic number.

Based on this idea this paper shows an efficient way to compute related “infra-chromatic” upper bounds without an explicit MaxSAT encoding. The reported results show some of the best times for a stand-alone computer over a number of instances from standard benchmarks.

Semantic network reduction is considered in application to visual analytics of relational data. Merging structurally equivalent nodes it is straightforward to construct a reduced semantic network that completely species the initial structure of relations between nodes. This paper presents the analysis of such reduction applied to the communication network from Stanford Large Network Dataset Collection. It is shown how the reduction based on structural equivalence can help in visualization of large semantic networks.

A model for organizing cargo transportation between two node stations connected by a railway line which contains a certain number of intermediate stations is considered. The movement of cargo is in one direction. Such a situation may occur, for example, if one of the node stations is located in a region which produce raw material for manufacturing industry located in another region, and there is another node station. The organization of freight traﬃc is performed by means of a number of technologies. These technologies determine the rules for taking on cargo at the initial node station, the rules of interaction between neighboring stations, as well as the rule of distribution of cargo to the ﬁnal node stations. The process of cargo transportation is followed by the set rule of control. For such a model, one must determine possible modes of cargo transportation and describe their properties. This model is described by a ﬁnite-dimensional system of diﬀerential equations with nonlocal linear restrictions. The class of the solution satisfying nonlocal linear restrictions is extremely narrow. It results in the need for the “correct” extension of solutions of a system of diﬀerential equations to a class of quasi-solutions having the distinctive feature of gaps in a countable number of points. It was possible numerically using the Runge–Kutta method of the fourth order to build these quasi-solutions and determine their rate of growth. Let us note that in the technical plan the main complexity consisted in obtaining quasi-solutions satisfying the nonlocal linear restrictions. Furthermore, we investigated the dependence of quasi-solutions and, in particular, sizes of gaps (jumps) of solutions on a number of parameters of the model characterizing a rule of control, technologies for transportation of cargo and intensity of giving of cargo on a node station.

Event logs collected by modern information and technical systems usually contain enough data for automated process models discovery. A variety of algorithms was developed for process models discovery, conformance checking, log to model alignment, comparison of process models, etc., nevertheless a quick analysis of ad-hoc selected parts of a journal still have not get a full-fledged implementation. This paper describes an ROLAP-based method of multidimensional event logs storage for process mining. The result of the analysis of the journal is visualized as directed graph representing the union of all possible event sequences, ranked by their occurrence probability. Our implementation allows the analyst to discover process models for sublogs defined by ad-hoc selection of criteria and value of occurrence probability

The geographic information system (GIS) is based on the first and only Russian Imperial Census of 1897 and the First All-Union Census of the Soviet Union of 1926. The GIS features vector data (shapefiles) of allprovinces of the two states. For the 1897 census, there is information about linguistic, religious, and social estate groups. The part based on the 1926 census features nationality. Both shapefiles include information on gender, rural and urban population. The GIS allows for producing any necessary maps for individual studies of the period which require the administrative boundaries and demographic information.

It is well-known that the class of sets that can be computed by polynomial size circuits is equal to the class of sets that are polynomial time reducible to a sparse set. It is widely believed, but unfortunately up to now unproven, that there are sets in EXPNP, or even in EXP that are not computable by polynomial size circuits and hence are not reducible to a sparse set. In this paper we study this question in a more restricted setting: what is the computational complexity of sparse sets that are *selfreducible*? It follows from earlier work of Lozano and Torán (in: Mathematical systems theory, 1991) that EXPNP does not have sparse selfreducible hard sets. We define a natural version of selfreduction, tree-selfreducibility, and show that NEXP does not have sparse tree-selfreducible hard sets. We also construct an oracle relative to which all of EXP is reducible to a sparse tree-selfreducible set. These lower bounds are corollaries of more general results about the computational complexity of sparse sets that are selfreducible, and can be interpreted as super-polynomial circuit lower bounds for NEXP.