### Book

## Computability and Complexity

Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a "good model" is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad ("non-stochastic") data appear "in real life"? Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve. In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory.

In the Interval Completion problem we are given an *n*-vertex graph *G* and an integer *k*, and the task is to transform *G* by making use of at most *k*edge additions into an interval graph. This is a fundamental graph modification problem with applications in sparse matrix multiplication and molecular biology. The question about fixed-parameter tractability of Interval Completion was asked by Kaplan et al. [FOCS 1994; SIAM J. Comput. 1999] and was answered affirmatively more than a decade later by Villanger et al. [STOC 2007; SIAM J. Comput. 2009], who presented an algorithm with running time *O*(*k*2*k**n*3*m*). We give the first subexponential parameterized algorithm solving Interval Completion in time *k**O*(&sqrt;*k*)*n**O*(1). This adds Interval Completion to a very small list of parameterized graph modification problems solvable in subexponential time.

It is known that the normalized algorithmic information distance is not computable and not semicomputable. We show that for all *𝜀*<1/2, there exist no semicomputable functions that differ from N by at most *𝜀*. Moreover, for any computable function *f* such that |lim*𝑡**𝑓*(*𝑥*,*𝑦*,*𝑡*)−N(*𝑥*,*𝑦*)|≤*𝜀* and for all *n*, there exist strings *x*, *y* of length *n* such that

∑*_𝑡 *|*𝑓*(*𝑥*,*𝑦*,*𝑡*+1)−*𝑓*(*𝑥*,*𝑦*,*𝑡*)| ≥* 𝛺*(log *𝑛*)

This is optimal up to constant factors.

We also show that the maximal number of oscillations of a limit approximation of N is *𝛺*(*𝑛*/log*𝑛*). This strengthens the *𝜔*(1) lower bound from [K. Ambos-Spies, W. Merkle, and S.A. Terwijn, 2019, *Normalized information distance and the oscillation hierarchy*].

Clustering is a well-known and important problem with numerous applications. The graph-based model is one of the typical cluster models. In the graph model generally clusters are defined as cliques. However, such approach might be too restrictive as in some applications, not all objects from the same cluster must be connected. That is why different types of cliques relaxations often considered as clusters. In our work, we consider a problem of partitioning graph into clusters and a problem of isolating cluster of a special type where by cluster we mean highly connected subgraph. Initially, such clusterization was proposed by Hartuv and Shamir. And their HCS clustering algorithm was extensively applied in practice. It was used to cluster cDNA fingerprints, to find complexes in protein-protein interaction data, to group protein sequences hierarchically into superfamily and family clusters, to find families of regulatory RNA structures. The HCS algorithm partitions graph in highly connected subgraphs. However, it is achieved by deletion of not necessarily the minimum number of edges. In our work, we try to minimize the number of edge deletions. We consider problems from the parameterized point of view where the main parameter is a number of allowed edge deletions. The presented algorithms significantly improve previous known running times for the Highly Connected Deletion (improved from \cOs\left(81^k\right) to \cOs\left(3^k\right)), Isolated Highly Connected Subgraph (from \cOs(4^k) to \cOs\left(k^{\cO\left(k^{\sfrac{2}{3}}\right)}\right) ), Seeded Highly Connected Edge Deletion (from \cOs\left(16^{k^{\sfrac{3}{4}}}\right) to \cOs\left(k^{\sqrt{k}}\right)) problems. Furthermore, we present a subexponential algorithm for Highly Connected Deletion problem if the number of clusters is bounded. Overall our work contains three subexponential algorithms which is unusual as very recently there were known very few problems admitting subexponential algorithms.

We study the Maximum Happy Vertices and Maximum Happy Edges problems. The former problem is a variant of clusterization, where some vertices have already been assigned to clusters. The second problem gives a natural generalization of Multiway Uncut, which is the complement of the classical Multiway Cut problem. Due to their fundamental role in theory and practice, clusterization and cut problems has always attracted a lot of attention. We establish a new connection between these two classes of problems by providing a reduction between Maximum Happy Vertices and Node Multiway Cut. Moreover, we study structural and distance to triviality parameterizations of Maximum Happy Vertices and Maximum Happy Edges. Obtained results in these directions answer questions explicitly asked in four works: Agrawal ’17, Aravind et al. ’16, Choudhari and Reddy ’18, Misra and Reddy ’17.

We study the Maximum Happy Vertices and Maximum Happy Edges problems. The former problem is a variant of clusterization, where some vertices have already been assigned to clusters. The second problem gives a natural generalization of Multiway Uncut, which is the complement of the classical Multiway Cut problem. Due to their fundamental role in theory and practice, clusterization and cut problems has always attracted a lot of attention. We establish a new connection between these two classes of problems by providing a reduction between Maximum Happy Vertices and Node Multiway Cut. Moreover, we study structural and distance to triviality parameterizations of Maximum Happy Vertices and Maximum Happy Edges. Obtained results in these directions answer questions explicitly asked in four works: Agrawal ’17, Aravind et al. ’16, Choudhari and Reddy ’18, Misra and Reddy ’17.

In the Shortest Superstring problem we are given a set of strings S=\{s_1, \ldots , s_n\} and integer \ell and the question is to decide whether there is a superstring *s* of length at most \ellcontaining all strings of *S* as substrings. We obtain several parameterized algorithms and complexity results for this problem. In particular, we give an algorithm which in time 2^{\mathcal {O}(k)} {\text {poly}}(n) finds a superstring of length at most \ell containing at least *k* strings of *S*. We complement this by a lower bound showing that such a parameterization does not admit a polynomial kernel up to some complexity assumption. We also obtain several results about “below guaranteed values” parameterization of the problem. We show that parameterization by compression admits a polynomial kernel while parameterization “below matching” is hard.

We study the relative complexity of equivalence relations and preorders from computability theory and complexity theory. Given binary relations R, S, a componentwise reducibility is defined by

R ≤ S ⇔ ∃f ∀x, y [x R y ↔ f (x) S f (y)].

Here, f is taken from a suitable class of effective functions. For us the relations will be on natural numbers, and f must be computable. We show that there is a -complete equivalence relation, but no -complete for k ≥ 2. We show that preorders arising naturally in the above-mentioned areas are -complete. This includes polynomial time m-reducibility on exponential time sets, which is , almost inclusion on r.e. sets, which is , and Turing reducibility on r.e. sets, which is .