### Book

## Fundamentals of Computation Theory, 22nd International Symposium, FCT 2019, Copenhagen, Denmark, August 12-14, 2019, Proceedings

In this paper we provide two equivalent characterizations of the notion of finite-state dimension introduced by Dai, Lathrop, Lutz and Mayordomo [7]. One of them uses Shannon’s entropy of non-aligned blocks and generalizes old results of Pillai [12] and Niven – Zuckerman [11]. The second characterizes finite-state dimension in terms of superadditive functions that satisfy some calibration condition (in particular, superadditive upper bounds for Kolmogorov complexity). The use of superadditive bounds allows us to prove a general sufficient condition for normality that easily implies old results of Champernowne [5], Besicovitch [1], Copeland and Erdös [6], and also a recent result of Calude, Staiger and Stephan [4].

The volume is dedicated to Boris Mirkin on the occasion of his 70th birthday. In addition to his startling PhD results in abstract automata theory, Mirkin’s ground breaking contributions in various fields of decision making and data analysis have marked the fourth quarter of the 20th century and beyond. Mirkin has done pioneering work in group choice, clustering, data mining and knowledge discovery aimed at finding and describing non-trivial or hidden structures—first of all, clusters, orderings, and hierarchies—in multivariate and/or network data.

This volume contains a collection of papers reflecting recent developments rooted in Mirkin's fundamental contribution to the state-of-the-art in group choice, ordering, clustering, data mining, and knowledge discovery. Researchers, students, and software engineers will benefit from new knowledge discovery techniques and application directions.

A new computer architecture named object-attribute is offered in the article. Computer of the architecture have all necessary properties for Artificial Intelligence: abstraction of data and program, height concurrency, isomorphism of data and program (i.e. possibility of painless changing of program and data structures), training and self-training of computer system, dataflow, integration of data and program, generation of object description from simple description to complex description, implementation of distribute computer system.

We consider an undirected graph $G = (VG, EG)$ with a set $T \subseteq VG$ of terminals, and with nonnegative integer capacities $c(v)$ and costs $a(v)$ of nodes $v\in VG$. A path in $G$ is a \emph{$T$-path} if its ends are distinct terminals. By a \emph{multiflow} we mean a function $F$ assigning to each $T$-path $P$ a nonnegative rational \emph{weight} $F(P)$, and a multiflow is called \emph{feasible} if the sum of weights of $T$-paths through each node $v$ does not exceed $c(v)$. The emph{value} of $F$ is the sum of weights $F(P)$, and the \emph{cost} of $F$ is the sum of $F(P)$ times the cost of $P$ w.r.t. $a$, over all $T$-paths $P$. Generalizing known results on edge-capacitated multiflows, we show that the problem of finding a minimum cost multiflow among the feasible multiflows of maximum possible value admits \emph{half-integer} optimal primal and dual solutions. Moreover, we devise a strongly polynomial algorithm for finding such optimal solutions.

We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern $P$ and a threshold $d$, we want to report (i) all longest extensions of $P$ which occur in at least $d$ documents, (ii) all shortest extensions of $P$ which occur in less than $d$ documents, and (iii) all shortest extensions of $P$ which occur only in $d$ selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.

Given a digraph $G = (VG,AG)$, an even factor $M \subseteq AG$ is a set formed by node-disjoint paths and even cycles. Even factors in digraphs were introduced by Geelen and Cunningham and generalize path matchings in undirected graphs. Finding an even factor of maximum cardinality in a general digraph is known to be NP-hard but for the class of odd-cycle symmetric digraphs the problem is polynomially solvable. So far the only combinatorial algorithm known for this task is due to Pap; its running time is $O(n^4)$ (hereinafter $n$ denotes the number of nodes in $G$ and $m$ denotes the number of arcs or edges). In this paper we introduce a novel sparse recovery technique and devise an $O(n^3 \log n)$-time algorithm for finding a maximum cardinality even factor in an odd-cycle symmetric digraph. Our technique also applies to other similar problems, e.g. finding a maximum cardinality square-free simple $b$-matching.

We study a new variant of the string matching problem called {\em cross-document string matching}, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the {\em weighted level ancestor} problem.

The goal of the expert search task is nding knowledgeable persons within the enterprise. In this paper we focus on its distinctions from the other information retrieval tasks. We review the existing approaches and propose a new term weighting scheme which is based on analysis of communication patterns between people. The eectiveness of the proposed approach is evaluated on a collection of e-mails from an organization of approximately 1500 people. Results show that it is possible to take into account communication structure in the process of term weighting, eectively combining communication-based and document-based approaches to expert finding.

This book constitutes the refereed proceedings of the 12th Industrial Conference on Data Mining, ICDM 2012, held in Berlin, Germany in July 2012. The 22 revised full papers presented were carefully reviewed and selected from 97 submissions. The papers are organized in topical sections on data mining in medicine and biology; data mining for energy industry; data mining in traffic and logistic; data mining in telecommunication; data mining in engineering; theory in data mining; theory in data mining: clustering; theory in data mining: association rule mining and decision rule mining.