## Computer Science

This concise book provides a survival toolkit for efficient, large-scale software development. Discussing a multi-contextual research framework that aims to harness human-related factors in order to improve flexibility, it includes a carefully selected blend of models, methods, practices, and case studies. To investigate mission-critical communication aspects in system engineering, it also examines diverse, i.e. cross-cultural and multinational, environments.

This book helps students better organize their knowledge bases, and presents conceptual frameworks, handy practices and case-based examples of agile development in diverse environments. Together with the authors’ previous books, "Crisis Management for Software Development and Knowledge Transfer" (2016) and "Managing Software Crisis: A Smart Way to Enterprise Agility" (2018), it constitutes a comprehensive reference resource that adds value to this book.

This book constitutes the proceedings of the 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019, held in Kazan, Russia, in July 2019.

The 24 full papers and 10 short papers were carefully reviewed and selected from 134 submissions (of which 21 papers were rejected without being reviewed). The papers are organized in topical sections on general topics of data analysis; natural language processing; social network analysis; analysis of images and video; optimization problems on graphs and network structures; analysis of dynamic behaviour through event data.

This volume contains the refereed proceedings of the 8th International Conference on Analysis of Images, Social Networks, and Texts (AIST 2019). The previous conferences during 2012–2018 attracted a significant number of data scientists – students, researchers, academics, and engineers working on interdisciplinary data analysis of images, texts, and social networks.

Proceedings of the international conference "Neural Information Processing Systems 2019." (NeurIPS 2019)

This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016.

The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life and social sciences; morphological and technological approaches to image analysis.

We propose a novel machine-learning-based approach to detect bid leakage in first-price sealed-bid auctions. We extract and analyze the data on more than 1.4 million Russian procurement auctions between 2014 and 2018. As bid leakage in each particular auction is tacit, the direct classification is impossible. Instead, we reduce the problem of bid leakage detection to Positive-Unlabeled Classification. The key idea is to regard the losing participants as fair and the winners as possibly corrupted. This allows us to estimate the prior probability of bid leakage in the sample, as well as the posterior probability of bid leakage for each specific auction. We find that at least 16% of auctions are exposed to bid leakage. Bid leakage is more likely in auctions with a higher reserve price, lower number of bidders and lower price fall, and where the winning bid is received in the last hour before the deadline.

The International Workshop on Enterprise and Organizational Modeling and Simulation (EOMAS) represents a forum where researchers and practitioners exchange and mutually enrich their views, approaches, and obtain results in the field of enterprise engineering and enterprise architecture. The most valuable asset of every conference and workshop is its community. The community of EOMAS is small, but it consists of founding members, long-term contributors, and every year it attracts new innovative participants. This year, EOMAS reached its 15th edition and took place in Rome, Italy, during June 3–4, 2019. Traditionally, we can offer a balanced assortment of papers addressing formal foundations of enterprise modeling and simulation, conceptual modeling approaches, higher-level insights and applications bringing novel ideas to traditional approaches, as well as new emerging trends. Out of 24 submitted papers, 12 were accepted for publication as full papers and for oral presentation, and each paper was carefully selected, reviewed, and revised. In additional to this we reflected on the interest of last year’s invited workshop on usability and invited the experts to make a sequel. You can find a short report in this issue. This year, we included a novel outlet of Master and Doctoral Consortium, which attracted young talent to present their work. The presented work was then discussed, and feedback, advice, and encouragement was given. We were really surprised by the relevance, methodological quality, and results of their work – you may find their contributions on our website https://eomas-workshop.org. We would like to express our sincere thanks to the entire EOMAS community: the authors, the Program Committee and the CAiSE organizers, the chairs for their enthusiasm and devotion, as well as all participants for their contributions. We look forward to the 16th edition of EOMAS!

Workshop concentrates on an interdisciplinary approach to modelling human behavior incorporating data mining and expert knowledge from behavioral sciences. Data analysis results extracted from clean data of laboratory experiments will be compared with noisy industrial datasets from the web e.g. Insights from behavioral sciences will help data scientists. Behavior scientists will see new inspirations to research from industrial data science. Market leaders in Big Data, as Microsoft, Facebook, and Google, have already realized the importance of experimental economics know-how for their business.

In Experimental Economics, although financial rewards restrict subjects preferences in experiments, exclusive application of analytical game theory is not enough to explain the collected data. It calls for the development and evaluation of more sophisticated models. The more data is used for evaluation, the more statistical significance can be achieved. Since large amounts of behavioral data are required to scan for regularities, along with automated agents needed to simulate and intervene in human interactions, Machine Learning is the tool of choice for research in Experimental Economics. This workshop is aimed at bringing together researchers from both Data Analysis and Economics in order to achieve mutually beneficial results.

This volume constitutes the refereed proceedings of the 4th International Conference on Digital Transformation and Global Society, DTGS 2019, held in St. Petersburg, Russia, in June 2019.

The 56 revised full papers and 9 short papers presented in the volume were carefully reviewed and selected from 194 submissions. The papers are organized in topical sections on e-polity: governance; e-polity: politics online; e-city: smart cities and urban planning; e-economy: online consumers and solutions; e-society: computational social science; e-society: humanities and education; international workshop on internet psychology; international workshop on computational linguistics.

This edition of Procedia Computer Science represents the proceedings of the 23rd International Conference on Knowledge - Based and Intelligent Information & Engineering Systems (KES 2019), organised by KES International and held at the Danubius Health Spa Resort, Budapest over 4-6 September 2019. KES 2019 was the 23rd event in a series of broad-spectrum intelligent systems conferences first held in Adelaide, Australia in 1997. The main aim of this KES conference series is to provide an internationally respected forum for the dissemination of research results and the discussion of issues relating to the theory, technologies and applications of intelligent engineering and information systems. This truly international conference attracted submissions from a substantial number of researchers and practitioners from all over the world, who submitted their papers to three general tracks, one thematic track and 34 special sessions on specific topics. A large number of submissions was received and each paper was peer reviewed by at least two members of the International Program Committee. From them, 274 high-quality papers were accepted for oral presentation and publication in Procedia Computer Science, submitted for indexing in Conference Proceedings Citation Index (CPCI) and Scopus. The conference chairs would like to express their gratitude to the Keynote Speakers: Prof Dana Barry, Clarkson University, USA, title of talk: 'STEM and ICT Education in Intelligent Environments'; Dr Carlos Toro, ARTC (Advanced Remanufacturing and Technology Centre) - A*Star, Singapore, title of talk: 'Smart Manufacturing coming of age'; Prof Katsutoshi Yada, Kansai University, Japan, title of talk: 'Sensor Marketing and Data Mining'; Prof Cecilia Zanni-Merk, INSA Rouen Normandie / LITIS Laboratory, France, title of talk 'On the need of an Explainable Artificial Intelligence'; and Prof Sergey Zykov, National Research University Higher School of Economics, Russia, title of talk: 'IT Crisisology: the New Discipline for Managing Software Development in Crisis'. We would like to acknowledge also the Programme Co-Chairs, the General Track Chairs, the International Programme Committee members and reviewers for their valuable efforts in the review process, helping us to guarantee the highest quality possible for the conference. We would also like to thank the organisers and chairs of the special sessions which make an essential contribution to the success of the conference. Lastly, we would like to thank all the authors, presenters and delegates for their valuable contribution in making this an extraordinary event. KES International hopes and intends that KES2019 will make a significant contribution to international research collaboration and understanding, an essential task for the promotion of scientific joint work and excellence.

This book constitutes the post-conference proceedings of the 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019, held in Kazan, Russia, in July 2019.

The 27 full and 8 short papers were carefully reviewed and selected from 134 submissions (of which 21 papers were automatically rejected without being reviewed). The papers are organized in topical sections on general topics of data analysis; natural language processing; social network analysis; analysis of images and video; optimization problems on graphs and network structures; and analysis of dynamic behavior through event data.

We study the Maximum Happy Vertices and Maximum Happy Edges problems. The former problem is a variant of clusterization, where some vertices have already been assigned to clusters. The second problem gives a natural generalization of Multiway Uncut, which is the complement of the classical Multiway Cut problem. Due to their fundamental role in theory and practice, clusterization and cut problems has always attracted a lot of attention. We establish a new connection between these two classes of problems by providing a reduction between Maximum Happy Vertices and Node Multiway Cut. Moreover, we study structural and distance to triviality parameterizations of Maximum Happy Vertices and Maximum Happy Edges. Obtained results in these directions answer questions explicitly asked in four works: Agrawal ’17, Aravind et al. ’16, Choudhari and Reddy ’18, Misra and Reddy ’17.

The Third Workshop on Computer Modelling in Decision Making (CMDM 2018) was held in Saratov State University (Saratov, Russia) within the VII International Youth Research and Practice Conference ‘Mathematical and Computer Modelling in Economics, Insurance and Risk Management’. The workshop 's main topic is computer and mathematical modeling in decision making in finance, insurance, banking, economic forecasting, investment and financial analysis. Researchers, postgraduate students, academics as well as financial, bank, insurance and government workers participated in the Workshop.

ICUMT is an IEEE premier an annual international congress providing an open forum for researchers, engineers, network planners and service providers targeted on newly emerging algorithms, systems, standards, services, and applications, bringing together leading international players in telecommunications, control systems, automation and robotics. The event is positioned as a major international annual congress for the presentation of original results achieved from fundamental as well as applied research and engineering works.

We study synchronization aspects in parallel discrete event simulation (PDES) algorithms. Our analysis is based on the recently introduced model of virtual times evolution in an optimistic synchronization algorithm. This model connects synchronization aspects with the properties of the profile of the local virtual times. The main parameter of the model is a “growth rate” q = 1/(1 + b), where b is a mean rollback length. We measure the average utilization of events and the desynchronization between logical processes as functions of the parameter q. We found that there is a phase transition between an “active phase”, i.e. when the utilization of the average processing time is finite, and an “absorbing state” with zero utilization, vanishing at a critical point qc ≈ 0.136. The average desynchronization degree (i.e. the vari- ance of local virtual times) grows with the parameter q. We also investi- gate the influence of the sparse distant communications between logical processes and found that they do not change drastically the synchronization properties in the optimistic synchronization algorithm, which is the sharp contrast with the conservative algorithm [1]. Finally, we compare our results with the existing case-study simulations.

This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis. In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.

This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization. The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent. Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go. Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.

The materials of The International Scientific – Practical Conference is presented below. The Conference reflects the modern state of innovation in education, science, industry and social-economic sphere, from the standpoint of introducing new information technologies. It is interesting for a wide range of researchers, teachers, graduate students and professionals in the field of innovation and information technologies.

Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models—Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)—we, first of all, show that the minimum of Renyi entropy coincides with the “true” number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.

Aim of the work was to study the influence of different brain rhythms (i.e. theta, beta, gamma ranges with frequencies from 5 to 80 Hz) on the ultraslow oscillations with frequency of 0.5 Hz and below, where high and low activity states alternate. Ultraslow oscillations are usually observed within neural activity in the human brain and in the prefrontal cortex in particular during rest. Ultraslow oscillations are considered to be generated by local cortical circuitry together with pulse-like inputs and neuronal noise. Structure of ultraslow oscillations shows specific statistics and their characteristics has been connected with cognitive abilities, such as working memory performance and capacity. Methods. In the study we used previously constructed computational model describing activity of a cortical circuit consisting of the populations of pyramidal cells and interneurons. This model was developed to mimic global input impinging on the local prefrontal cortex circuit from other cortical areas or subcortical structures. The model dynamics was studied numerically. Results. We found that frequency increase deferentially lengthens the up states and therefore increases stability of self-sustained activity with oscillations in the gamma band. Discussion. We argue that such effects would be beneficial to information processing and transfer in cortical networks with hierarchical inhibition.

The term Big Data refers to an extensive collections of digital data generating every second. Produced datasets come in structured, semi-structured, and unstructured formats throughout the world, which is difficult for the traditional database management systems to analyze. Recently, big data analytics emerges as an essential research area due to the popularity of the Internet and the advent of new Web technologies. This growing area of research represents a multi-disciplinary that attracts researchers from various research fields. Interested researchers are invited to design, develop, and implement several tools, technologies, architecture, and platforms for analyzing these large volumes of data. This paper begins with a brief introduction to big data and related concepts, including the main characteristics of big data, followed by discussions of the most significant open research challenges and emerging trends. Next, we review a study of big data analytics, the advantages of using big data solutions, and the preliminary assessments required before migrating from traditional solutions. Finally, we present a review of the recent main applications to obtain a broad perspective of big data analytics.

Motivation

Imaging mass spectrometry (imaging MS) is a prominent technique for capturing distributions of molecules in tissue sections. Various computational methods for imaging MS rely on quantifying spatial correlations between ion images, referred to as co-localization. However, no comprehensive evaluation of co-localization measures has ever been performed; this leads to arbitrary choices and hinders method development.

Results

We present ColocML, a machine learning approach addressing this gap. With the help of 42 imaging MS experts from nine laboratories, we created a gold standard of 2210 pairs of ion images ranked by their co-localization. We evaluated existing co-localization measures and developed novel measures using term frequency–inverse document frequency and deep neural networks. The semi-supervised deep learning Pi model and the cosine score applied after median thresholding performed the best (Spearman 0.797 and 0.794 with expert rankings, respectively). We illustrate these measures by inferring co-localization properties of 10 273 molecules from 3685 public METASPACE datasets.

Increasing the number of computational cores is a primary way of achieving the high performance of contemporary supercomputers. However, developing parallel applications capable to harness the enormous amount of cores is a challenging task. It is very important to understand the principle limitations of the scalability of parallel applications imposed by the algorithm’s structure. The tree search addressed in this paper has an irregular structure unknown prior to computations. That is why such algorithms are challenging for parallel implementation especially on distributed memory systems. In this paper, we propose a parallel tree search algorithm aimed at distributed memory parallel computers. For this parallel algorithm, we analyze its scalability and show that it is close to the theoretical maximum.

**Introduction**: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study and amount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformations and their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models. **Purpose**: Developing and exploring a generalized approach to building a model, which consists in sequentially passing through he stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, and modeling. **Results**: Comparative experiments conducted using a generalized approach for classical machine learning and deep learning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processing have demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increase in quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that the use of automatic machine learning which uses classical classification algorithms is comparable in quality to manual model development; however, it takes much longer. The use of transfer learning has a small but positive effect on the classification quality. **Practical relevance:** The proposed sequential approach can significantly improve the quality of models under development in natural language processing problems.

In this paper, we consider the (n,3)-MAXSAT problem. The problem is a special case of the Maximum Satisfiability problem with an additional requirement that in the input formula each variable appears at most three times. Here, we improve previous upper bounds for (n,3)-MAXSAT in terms of *n* (number of variables) and in terms of *k* (number of clauses that we are required to satisfy). Moreover, we prove that satisfying more clauses than the simple all true assignment is an NP-hard problem.

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $\Omega(1/k)$. A simple numerical experiment is presented to support our theory.