### Book

## Computational Aspects and Applications in Large-Scale Networks. Springer Proceedings in Mathematics & Statistics

Contributions in this volume focus on computationally efficient algorithms and rigorous mathematical theories for analyzing large-scale networks. Researchers and students in mathematics, economics, statistics, computer science and engineering will find this collection a valuable resource filled with the latest research in network analysis. Computational aspects and applications of large-scale networks in market models, neural networks, social networks, power transmission grids, maximum clique problem, telecommunication networks, and complexity graphs are included with new tools for efficient network analysis of large-scale networks.

This proceeding is a result of the 7th International Conference in Network Analysis, held at the Higher School of Economics, Nizhny Novgorod in June 2017. The conference brought together scientists, engineers, and researchers from academia, industry, and government.

In this article we use the modular decomposition technique for exact solving the weighted maximum clique problem. Our algorithm takes the modular decomposition tree from the paper of Tedder et. al. and finds solution recursively. Also, we propose algorithms to construct graphs with modules. We show some interesting results, comparing our solution with Ostergards algorithm on DIMACS benchmarks and on generated graphs.

Human brain networks show modular organization: cortical regions tend to form densely connected modules with only weak inter-modular connections. However, little is known on whether modular structure of brain networks is reliable in terms of test-retest reproducibility and, most importantly, to what extent these topological modules are anatomically embedded. To address these questions, we use MRI data of the same individuals scanned with an interval of several weeks, reconstruct structural brain networks at multiple scales, partition them into communities and evaluate similarity of partitions (i) stemming from the test-retest data of the same versus different individuals and (ii) implied by network topology versus anatomy-based grouping of neighboring regions. First, our results demonstrate that modular structure of brain networks is well reproducible in test-retest settings. Second, the results provide evidence of the theoretically well-motivated hypothesis that brain regions neighboring in anatomical space also tend to belong to the same topological modules.

In this study, we investigated how scientific collaboration represented by co-authorship is related to citation indicators of a scientist. We use co-authorship network to explore the structure of scientific collaboration. For network construction, the profiles of scientists from various countries and scientific fields in Google Scholar were used. We ran the count data regression model for a sample of more than 30 thousand authors with the first citation after 2007 to analyze the correlation between co-authorship network parameters of scientists and their citation characteristics. We identify that there is a positive correlation between citation of scientist and number of his co-authors, between citation and the author’s closeness centrality, and between scholar’s citation and the average citation of his co-authors. Also, we reveal that h-index and i10-index are correlated significantly with the number of co-authors and average citation of co-authors. Based on these results, we can conclude that scientists who maintain more contacts and more active than others have better bibliometric indicators on an average.

Market network analysis attracts a growing attention last decades. One of the most important problems related with it is the detection of dynamics in market network. In the present paper, the stock market network of stock’s returns is considered. Probability of sign coincidence of stock’s returns is used as the measure of similarity between stocks. Robust (distribution free) multiple testing statistical procedure for testing dynamics of network is proposed. The constructed procedure is applied for German, French, UK, and USA market. It is shown that in most cases where the dynamics is observed it is determined by a small number of hubs in the associated rejection graph.

Invariance properties of statistical procedures for threshold graph identification are considered. An optimal procedure in the class of invariant multiple decision procedures is constructed.

In this paper, we propose to utilize the methods of network analysis to analyze the relationship between various elements that constitute any particular research in social sciences. Four levels that determine a design of the research can be established: ontological and epistemological assumptions that determine what is the reality under the study and how can we obtain the knowledge about it; a general methodological frame that defines the object of the study and a spectrum of research questions we are allowed to pose; and, finally, a list of methods that we might use in order to get answers. All these levels are interrelated, sometimes in very confusing way. We propose to extract a preliminary set of relations between various elements from textbooks on methodology of social and political sciences and to visualize and analyze their relations using network analytic methods.

The paper reviews the problem of age and gender recognition methods for video data using modern deep convolutional neural networks. We present the comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames. We implemented the video-based recognition system with several aggregation methods to improve the age and gender identification accuracy. The experimental comparison of the proposed approach with traditional simple voting using IJB-A, Indian Movies, and Kinect datasets is provided. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively.

In this paper, we propose the approach of structuring information in video surveillance systems by grouping the videos, which contain identical faces. First, the faces are detected in each frame and features of each facial region are extracted at the output of preliminarily trained deep convolution neural networks. Second, the tracks that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. In the experimental study with the YTF dataset, we examined several ways to aggregate features of individual frame in order to obtain descriptor of the whole video track. It was demonstrated that the most accurate and fast algorithm is the matching of normalized average feature vectors.

Graphical models are used in a variety of problems to uncover hidden structures. There is an important number of different identification procedures to recover graphical model from observations. In this paper, undirected Gaussian graphical models are considered. Some Gaussian graphical model identification statistical procedures are compared using different measures, such as Type I and Type II errors, ROC AUC.

The paper presents a tabu search heuristic for the Fleet Size and Mix Vehicle Routing Problem (FSMVRP) with hard and soft time windows. The objective function minimizes the sum of travel costs, fixed vehicle costs, and penalties for soft time window violations. The algorithm is based on the tabu search with several neighborhoods. The main contribution of the paper is the efficient algorithm for a real-life vehicle routing problem. To the best of our knowledge, there are no papers devoted to the FSMVRP problem with soft time windows, while in real-life problems, this is a usual case. We investigate the performance of the proposed heuristic on the classical Solomon instances with additional constraints. We also compare our approach without soft time windows and heterogeneous fleet of vehicles with the recently published results on the VRP problem with hard time windows.

One of the approaches for the nearest neighbor search problem is to build a network which nodes correspond to the given set of indexed objects. In this case the search of the closest object can be thought as a search of a node in a network. A procedure in a network is called decentralized if it uses only local information about visited nodes and its neighbors. Networks, which structure allows efficient performing the nearest neighbor search by a decentralized search procedure started from any node, are of particular interest especially for pure distributed systems. Several algorithms that construct such networks have been proposed in literature. However, the following questions arise: “Are there network models in which decentralized search can be performed faster?”; “What are the optimal networks for the decentralized search?”; “What are their properties?”. In this paper we partially give answers to these questions. We propose a mathematical programming model for the problem of determining an optimal network structure for decentralized nearest neighbour search. We have found the exact solutions for a regular lattice of size 4x4 and heuristic solutions for sizes from 5x5 to 7x7. As a distance function we use L_1, L_2 and L_inf metrics. We hope that our results and the proposed model will initiate study of optimal network structures for decentralized nearest neighbour search.

The study was aimed to analyze advantages of the Deep Learning methods over other baseline machine learning methods using sentiment analysis task in Twitter. All the techniques were evaluated using a set of English tweets with classification on a five-point ordinal scale provided by SemEval-2017 organizers. For the implementation, we used two open source Python libraries. The results and conclusions of the study are discussed.

Market network analysis attracts a growing attention last decade. Important component of the market network is a model of stock returns distribution. Elliptically contoured distributions are popular as probability model of stock returns. The question of adequacy of this model to real market data is open. There are known results that reject such model and at the same time there are results that approve such model. Obtained results are concerned to testing some properties of elliptical model. In the paper another property of elliptical model namely property of symmetry condition of tails of 2-dimentional distribution is considered. Multiple statistical procedure for testing elliptical model for stock returns distribution is proposed. Sign symmetry conditions of tails distribution are chosen as individual hypotheses for multiple testing. Uniformly most powerful tests of Neyman structure are constructed for individual hypotheses testing. Associated stepwise multiple testing procedure is applied for the real market data. To visualize the results a rejection graph is constructed. The main result is that under some conditions tail symmetry hypothesis is not rejected if one remove a few number of hubs from the rejection graph.

One of the major problem of recommendation services is commercial astroturfing. This work is devoted to constructing a model capable of detecting astroturfing based on network analysis. The main idea of the model is projecting a multipartite network to a unipartite and detecting communities in it representing actors with falsified opinions.

Online social networks play major role in the spread of information on a very large scale. One of the major problems is to predict information propagation using social network interactions. The main purpose of this paper is to construct heuristic model of weighted graph based on empirical data that can outperform the existing models. We suggest a new approach of constructing the model of information based on matching specific weights to a given network.

In this paper, we present FPT algorithms for special cases of the shortest vector problem (SVP) and the integer linear programming problem (ILP), when matrices included in the problems’ formulations are near square. The main parameter is the maximal absolute value of rank minors of matrices included in the problem formulation. Additionally, we present FPT algorithms with respect to the same main parameter for the problems, when the matrices have no singular rank sub-matrices.

This essay questions whether digital literary studies can still be meaningfully regarded as part of literary studies. This heretical question is motivated by a praxeological view of a research project for the network analysis of dramatic texts, in particular by reflecting on the project’s underlying ›epistemic thing‹, which in this case consists of specifically-formatted structural data (and not the actual primary texts themselves). What does this corpus of structural data, which was extracted from 465 plays spanning the period from 1730 to 1930, have to do with the ›epistemic things‹ of literary studies? We explore this question by providing insight into our analyses, which describe the structural evolution of the ›plays‹, try to locate ›small world‹ properties in our corpus, and develop new metrics for plot analysis. The results show not only how digital methods can supplement or enrich literary studies; they also raise questions about how digital the field of literary studies already is, since its research objects are increasingly available in digital forms.

Using network approach, we propose a new method of identifying key food exporters based on the long-range (LRIC) and short-range interaction indices (SRIC). These indices allow to detect several groups of economies with direct as well as indirect influence on the routes of different levels in the food network.

Trading processes is a vital part of human life and any unstable situation results in the change of living conditions of individuals. We study the power of each country in terms of produce trade. Trade relations between countries are represented as a network, where vertices are territories and edges are export flows. As flows of products between participants are heterogeneous we consider various groups of substitute goods (cereals, fish, vegetables). We detect key participants affecting food retail with the use of classical centrality measures. We also perform clustering procedure in order to find communities in networks.

A “Network Analysis” section was arranged at the XVIIIth Interna- tional Academic Conference on Economic and Social Development at the Higher School of Economics on 11–12 April 2017. For the third year, this section invited scholars from sociology, political science, management, mathematics, and linguistics who use network analysis in their research projects. During the sessions, speakers discussed the development of mathematical models used in network analysis, studies of collaboration and communication networks, networks’ in- uence on individual attributes, identifcation of latent relationships and regularities, and application of network analysis for the study of concept networks.

The speakers in this section were E. V. Artyukhova (HSE), G. V. Gra- doselskaya (HSE), M. Е. Erofeeva (HSE), D. G. Zaitsev (HSE), S. A. Isaev (Adidas), V. A. Kalyagin (HSE), I. A. Karpov (HSE), A. P. Koldanov (HSE), I. I. Kuznetsov (HSE), S. V. Makrushin (Fi- nancial University), V. D. Matveenko (HSE), A. A. Milekhina (HSE), S. P. Moiseev (HSE), Y. V. Priestley (HSE), A. V. Semenov (HSE), I. B. Smirnov (HSE), D. A. Kharkina (HSE, St. Petersburg), C. F. Fey (Aalto University School of Business), and F. López-Iturriaga (Uni- versity of Valladolid).

An approach to the detection of hidden information (stegocontainers) in the audio data of MP3 files based on neural network modeling is considered. A multilayer perceptron is used as the instrumental model of the neural network. The structural components of the MP3 file are analyzed: fields containing related information (song titles, album, information about the author, lyrics, etc.), and frames, and fragmented sets of encoded audio data. Useful data are highlighted. A procedure is proposed for presenting audio data of any MP3 file as a uniform set of features of a relatively small size. The dimension of the feature set (data set) can be selected from the range [100-520], in accordance with the minimum and maximum frame size, depending on the compression quality of a single audio file when encoded in MP3 format. Modern software packages for encrypting and decrypting stegocontainers into MP3 files are being investigated. Based on selected software implementations, a database of examples (data sets) is formed from pre-processed MP3 files both containing the stegocontainer and without the stegocontainer. The structure of the neural network for steganalysis of MP3 files is determined experimentally, it is trained and tested. The test results of the neural network system allow us to state its high efficiency