We introduce the notion of a strong sufficient statistic for a given data string. We show that strong sufficient statistics have better properties than just sufficient statistics. We prove that there are “strange” data strings, whose minimal strong sufficient statistic have much larger complexity than the minimal sufficient statistic.

Heaps are well-studied fundamental data structures, having myriads of applications, both theoretical and practical. We consider the problem of designing a heap with an “optimal” extract-min operation. Assuming an arbitrary linear ordering of keys, a heap with n elements typically takes O(log n) time to extract the minimum. Extracting all elements faster is impossible as this would violate the Ω (nlog n) bound for comparison-based sorting. It is known, however, that is takes only O(n+ klog k) time to sort just k smallest elements out of n given, which prompts that there might be a faster heap, whose extract-min performance depends on the number of elements extracted so far. In this paper we show that this is indeed the case. We present a version of heap that performs insert in O(1) time and takes only O(log ∗ n+ log k) time to carry out the k-th extraction (where log ∗ denotes the iterated logarithm). All the above bounds are worst-case. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.

We study the following computational problem: for which values of *k*, the majority of *n* bits MAJ*n* can be computed with a depth two formula whose each gate computes a majority function of at most *k* bits? The corresponding computational model is denoted by MAJ*k* ∘ MAJ*k*. We observe that the minimum value of *k* for which there exists a MAJ*k* ∘ MAJ*k* circuit that has high correlation with the majority of *n* bits is equal to Θ(*n*1/2). We then show that for a randomized MAJ*k* ∘ MAJ*k* circuit computing the majority of *n* input bits with high probability for every input, the minimum value of *k* is equal to *n*2/3 + *o*(1). We show a worst case lower bound: if a MAJ*k* ∘ MAJ*k* circuit computes the majority of *n* bits correctly on all inputs, then *k* ≥ *n*13/19 + *o*(1).

Van Lambalgen’s theorem states that a pair (*α*, *β*) of bit sequences is Martin-Löf random if and only if*α* is Martin-Löf random and *β* is Martin-Löf random relative to *α*. In [Information and Computation 209.2 (2011): 183-197, Theorem 3.3], Hayato Takahashi generalized van Lambalgen’s theorem for computable measures *P* on a product of two Cantor spaces; he showed that the equivalence holds for each *β* for which the conditional probability *P*(⋅|*β*) is computable. He asked whether this computability condition is necessary. We give a positive answer by providing a computable measure for which van Lambalgen’s theorem fails. We also present a simple construction of a computable measure for which conditional measure is not computable. Such measures were first constructed by Ackerman et al. ([1]).

The definition of conditional probability in the case of continuous distributions (for almost all conditions) was an important step in the development of mathematical theory of probabilities. Can we define this notion in algorithmic probability theory for individual random conditions? Can we define randomness with respect to the conditional probability distributions? Can van Lambalgen’s theorem (relating randomness of a pair and its elements) be generalized to conditional probabilities? We discuss the developments in this direction. We present almost no new results trying to put known results into perspective and explain their proofs in a more intuitive way. We assume that the reader is familiar with basic notions of measure theory and algorithmic randomness (see, e.g., Shen et al. 2013 or Shen 2015 for a short introduction).

When we represent a decision problem, like CIRCUIT-SAT, as a language over the binary alphabet, we usually do not specify how to encode instances by binary strings. This relies on the empirical observation that the truth of a statement of the form ``CIRCUIT-SAT belongs to a complexity class $C$'' does not depend on the encoding, provided both the encoding and the class $C$ are ``natural''. In this sense most of the Complexity theory is ``encoding invariant''. The notion of a polynomial time computable distribution from Average Case Complexity is one of the exceptions from this rule. It might happen that a distribution over some objects, like circuits, is polynomial time computable in one encoding and is not polynomial time computable in the other encoding.In this paper we suggest an encoding invariant generalization of a notion of a polynomial time computable distribution. The completeness proofs of known distributional problems, like Bounded Halting are simpler for the new class than for polynomial time computable distributions. This paper has no new technical contributions. All the statements are proved using the known techniques.

Algorithmic statistics looks for models of observed data that are good in the following sense: a model is simple (i.e., has small Kolmogorov complexity) and captures all the algorithmically discoverable regularities in the data. However, this idea can not be used in practice as is because Kolmogorov complexity is not computable. In this paper we develop an algorithmic version of algorithmic statistics that uses space-bounded Kolmogorov complexity. We prove a space-bounded version of a basic result from “classical” algorithmic statistics, the connection between optimality and randomness deficiences. The main tool is the Nisan–Wigderson pseudo-random generator. An extended abstract of this paper was presented at the 12th International Computer Science Symposium in Russia (Milovanov 10).

We study the a priori semimeasure of sets of P_-random infinite sequences, where P_ is a family of probability distributions depending on a real parameter _. In the case when for a computable probability distribution P_ an effectively strictly consistent estimator exists, we show that Levin's a priory semimeasure of the set of all P_-random sequences is positive if and only if the parameter _ is a computable real number. We show that the a priory semimeasure of the set I, where I is the set of all P_-random sequences and the union is taken over all algorithmically non-random sequences is positive.

We consider the problem of managing a bounded size First-In-First-Out (FIFO) queue buffer, where each incoming unit-sized packet requires several rounds of processing before it can be transmitted out. Our objective is to maximize the total number of successfully transmitted packets. We consider both push-out (when a policy is permitted to drop already admitted packets) and non-push-out cases. We provide worst-case guarantees for the throughput performance of our algorithms, proving both lower and upper bounds on their competitive ratio against the optimal algorithm, and conduct a comprehensive simulation study that experimentally validates predicted theoretical behavior.

In this paper we study interactive “one-shot” analogues of the classical Slepian–Wolf theorem. Alice receives a value of a random variable X, Bob receives a value of another random variable Y that is jointly distributed with X. Alice’s goal is to transmit X to Bob (with some error probability ε). Instead of one-way transmission we allow them to interact. They may also use shared randomness. We show, that for every natural r Alice can transmit X to Bob using (1+1r)H(X|Y)+r+O(log2(1ε))(1+1r)H(X|Y)+r+O(log2(1ε)) bits on average in 2H(X|Y)r+22H(X|Y)r+2 rounds on average. Setting r=⌈H(X|Y)−−−−−−−√⌉r=⌈H(X|Y)⌉ and using a result of Braverman and Garg (2) we conclude that every one-round protocol π with information complexity I can be compressed to a (many-round) protocol with expected communication about I+2I–√I+2I bits. This improves a result by Braverman and Rao (3), where they had I+5I–√I+5I. Further, we show (by setting r = ⌈H(X|Y)⌉) how to solve this problem (transmitting X) using 2H(X|Y)+O(log2(1ε))2H(X|Y)+O(log2(1ε)) bits and 4 rounds on average. This improves a result of Brody et al. (4), where they had 4H(X|Y)+O(log1/ε)4H(X|Y)+O(log1/ε) bits and 10 rounds on average. In the end of the paper we discuss how many bits Alice and Bob may need to communicate on average besides H(X|Y). The main question is whether the upper bounds mentioned above are tight. We provide an example of (X, Y), such that transmission of X from Alice to Bob with error probability ε requires H(X|Y)+Ω(log2(1ε))H(X|Y)+Ω(log2(1ε)) bits on average.

We study a stability property of probability laws with respect to small violations of algorithmic randomness. Some sufficient condition of stability is presented in terms of Schnorr tests of algorithmic randomness. Most probability laws, like the strong law of large numbers, the law of iterated logarithm, and even Birkhoff's pointwise ergodic theorem for ergodic transformations, are stable in this sense. Nevertheless, the phenomenon of instability occurs in ergodic theory. Firstly, the stability property of Birkhoff's ergodic theorem is non-uniform. Moreover, a computable non-ergodic measure-preserving transformation can be constructed such that the ergodic theorem is non-stable.

Consider the following decision problem: for a given monotone Boolean function *f* decide, whether *f* is read-once. For this problem, it is essential how the input function *f* is represented. Elbassioni et al. (J. Comb. Optim. 22(3), 293–304, 2011) proved that this problem is coNP-complete when *f* is given by a depth-4 read-2 monotone Boolean formula. Gurvich (2010) proved that this problem is coNP-complete even when the input is the following expression: *C* ∨ *D**n*, where *D**n* = *x*1*y*1 ∨ … ∨ *x**n**y**n* and *C* is a monotone CNF over the variables *x*1, *y*1, … , *x**n*, *y**n*(note that this expression is a monotone Boolean formula of depth 3; in Gurvich (2010) nothing is said about the readability of *C*, but the proof is valid even if *C* is read-2 and thus the entire formula is read-3). We show that we can test in polynomial-time whether a given expression *C*∨ *D* computes a read-once function, provided that *C* is a read-once monotone CNF and *D* is a read-once monotone DNF and all the variables of *C* occur also in *D* (recall that due to Gurvich, the problem is coNP-complete when *C* is read-2). We also observe that from the so-called Sausage Lemma of Boros et al. (2009) it follows that the problem of recognizing read-once functions is coNP-complete when the input formula is depth-3 read-2.

Let {φp} be an optimal Gödel numbering of the family of computable functions (in Schnorr’s sense), where p ranges over binary strings. Assume that a list of strings L(p) is computable from p and for all p contains a φ-program for φp whose length is at most ε bits larger than the length of the shortest φ-programs for φp. We show that for infinitely many p the list L(p) must have 2|p|−ε−O(1) strings. Here ε is an arbitrary function of p. Short lists with short programs from programs of functions and strings.

Algorithmic statistics is a part of algorithmic information theory (Kolmogorov complexity theory) that studies the following task: given a finite object x (say, a binary string), find an `explanation' for it, i.e., a simple finite set that contains x and where x is a `typical element'. Both notions (`simple' and `typical') are defined in terms of Kolmogorov complexity. It is known that this cannot be achieved for some objects: there are some ``non-stochastic'' objects that do not have good explanations. In this paper we study the properties of maximally non-stochastic objects; we call them ``antistochastic''. In this paper, we demonstrate that the antistochastic strings have the following property: if an antistochastic string x has complexity k, then any k bit of information about x are enough to reconstruct x (with logarithmic advice). In particular, if we erase all but k bits of this antistochastic string, the erased bits can be restored from the remaining ones (with logarithmic advice). As a corollary we get the existence of good list-decoding codes with erasures (or other ways of deleting part of the information). Antistochastic strings can also be used as a source of counterexamples in algorithmic information theory. We show that the symmetry of information property fails for total conditional complexity for antistochastic strings.

Sophistication and logical depth are two measures that express how complicated the structure in a string is. Sophistication is defined as the minimal complexity of a computable function that defines a two-part description for the string that is shortest within some precision; the second can be defined as the minimal computation time of a program that is shortest within some precision. We show that the Busy Beaver function of the sophistication of a string exceeds its logical depth with logarithmically bigger precision, and that logical depth exceeds the Busy Beaver function of sophistication with logarithmically bigger precision. We also show that sophistication is unstable in its precision: constant variations can change its value by a linear term in the length of the string.