### Book chapter

## Information Distance Revisited

We consider the notion of information distance between two objects x and y introduced by Bennett, Gács, Li, Vitanyi, and Zurek [C. H. Bennett et al., 1998] as the minimal length of a program that computes x from y as well as computing y from x, and study different versions of this notion. In the above paper, it was shown that the prefix version of information distance equals max (K(x|y),K(y|x)) up to additive logarithmic terms. It was claimed by Mahmud [Mahmud, 2009] that this equality holds up to additive O(1)-precision. We show that this claim is false, but does hold if the distance is at least logarithmic. This implies that the original definition provides a metric on strings that are at superlogarithmically separated.

### In book

Antistochastic strings are those strings that do not have any reasonable statistical explanation. We establish the follow property of such strings: every antistochastic string *x* is “holographic” in the sense that it can be restored by a short program from any of its part whose length equals the Kolmogorov complexity of *x*. Further we will show how it can be used for list decoding from erasing and prove that Symmetry of Information fails for total conditional complexity.

We present a new structural lemma for deterministic con- text free languages. From the first sight, it looks like a pumping lemma, because it is also based on iteration properties, but it has significant distinctions that makes it much easier to apply. The structural lemma is a combinatorial analogue of KC-DCF-Lemma (based on Kolmogorov complexity), presented by Li and Vit ́anyi in 1995 and corrected by Glier in 2003. The structural lemma allows not only to prove that a language is not a DCFL, but discloses the structure of DCFLs Myhill-Nerode classes.

Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a "good model" is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad ("non-stochastic") data appear "in real life"? Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve. In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory.

The paper [Harry Buhrman, Michal Kouck ́, Nikolay Vereshcha- y gin. Randomized Individual Communication Complexity. IEEE Con- ference on Computational Complexity 2008: 321-331] considered com- munication complexity of the following problem. Alice has a bi- nary string x and Bob a binary string y, both of length n, and they want to compute or approximate Kolmogorov complexity C(x|y) of x conditional to y. It is easy to show that deterministic communica- tion complexity of approximating C(x|y) with precision α is at least n − 2α − O(1). The above referenced paper asks what is random- ized communication complexity of this problem and shows that for r- round randomized protocols its communication complexity is at least Ω((n/α)1/r ). In this paper, for some positive ε, we show the lower bound 0.99n for (worst case) communication length of any random- ized protocol that with probability at least 0.01 approximates C(x|y) with precision εn for all input pairs.