Algorithmic Statistics, Prediction and Machine Learning

A. Milovanov

doi:10.4230/LIPIcs.STACS.2016.54

Publications

?

Algorithmic Statistics, Prediction and Machine Learning

Ch. 54. P. 1–13.

Milovanov A.

Algorithmic statistics considers the following problem: given a binary string

(e.g., some

experimental data), find a “good” explanation of this data. It uses algorithmic information

theory to define formally what is a good explanation. In this paper we extend this framework in

two directions.

First, the explanations are not only interesting in themselves but also used for prediction: we

want to know what kind of data we may reasonably expect in similar situations (repeating the

same experiment). We show that some kind of hierarchy can be constructed both in terms of

algorithmic statistics and using the notion of a priori probability, and these two approaches turn

out to be equivalent (Theorem 5).

Second, a more realistic approach that goes back to machine learning theory, assumes that

we have not a single data string

but some set of “positive examples”

,...,x

that all belong

to some unknown set

, a property that we want to learn. We want this set

to contain all

positive examples and to be as small and simple as possible. We show how algorithmic statistic

can be extended to cover this situation (Theorem 11)

Language: English

DOI

Text on another site

Keywords: prediction Kolmogorov complexity minimum description length

Publication based on the results of:

Теоретическая информатика (2016)

In book

33rd Symposium on Theoretical Aspects of Computer Science (STACS 2016) Leibniz International Proceedings in Informatics (LIPIcs)

Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, 2016.

Восточная Европа в древности и средневековье. Чтения памяти члена-корреспондента АН СССР В.Т. Пашуто. Выпуск XXXVIII. Будущее в настоящем: предсказания, знамения, видения, сны

М.: ИВИ РАН, 2026.

Выпуск содержит статьи участников XXXVIII Чтений па мяти чл.-корр. АН СССР В.Т. Пашуто, посвященных формам «проявления» будущего в настоящем; месту и роли «будущего в настоящем» в литературном тексте; жанровым особенностям «текстов о будущем»; влиянию предсказаний на реальное бу дущее; социокультурному значению «будущего в настоящем». Эта проблематика раскрывается на материале европейских и азиатских обществ Древности и ...

Added: April 17, 2026

Статус философского предсказания: «Новое Средневековье» Н.А. Бердяева

Morozov D., Философский журнал 2026 Т. 19 № 1 С. 134–146

The article examines the status of philosophical prediction as a distinct type of historiosophical discourse. The specificity of such prediction is analyzed through the case study of N. A. Berdyaev's work “The New Middle Ages”, published in 1924. It presents a substantive vision of an impending era characterized by a revival of a religious worldview. The ...

Added: January 26, 2026

О базовых математических определениях цифровых технологий и искусственного интеллекта

Semenov A., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 № S С. 7–12

The paper proposes a system of definitions for the basic concepts of computability theory that underlie the mathematics of the digital world: algorithm, computability, calculus, object complexity, close to modern undertnding. Hierarchies of the finite and the problem of consistency are considered. ...

Added: December 6, 2025

Анализ способов построения математических моделей электрических контактов для исследования процессов образования контактных радиопомех на подвижных объектах

Андреева А. А., Utkin B., Grachev N. N. et al., В кн.: Шарыгинские чтения. Шестая международная научная конференция веду щих научных школ в области радиолокации, радионавигации и радио- электронных систем передачи информации.: Томск: Издательство Томского государственного университета систем управления и радиоэлектроники, 2024. С. 135–140.

A comparison of methods for constructing mathematical models of electrical contacts for studying the formation of contact radio interference is carried out. In particular, three main models of contact resistance control are considered, as well as their analysis and recommendations for their use in solving the problem of predicting contact radio interference are proposed. ...

Added: June 29, 2025

Kolmogorov’s Last Discovery? (Kolmogorov and Algorithmic Statistics)

Semenov A., Shen A., Vereshchagin N., Theory of Probability and its Applications, USA 2024 Vol. 68 No. 4 P. 582–606

The definition of descriptional complexity of finite objects suggested by Kolmogorov and other authors in the mid-1960s is now well known. In addition, Kolmogorov pointed out some approaches to a more fine-grained classification of finite objects, such as the resource-bounded complexity (1965), structure function (1974), and the notion of $(\alpha,\beta)$-stochasticity (1981). Later it turned out ...

Added: January 16, 2025

On information content in certain objects

Vereshchagin N., / Series arXiv "math". 2024.

The fine approach to measure information dependence is based on the total conditional complexity CT(y|x), which is defined as the minimal length of a total program that outputs y on the input x. It is known that the total conditional complexity can be much larger than than the plain conditional complexity. Such strings x, y ...

Added: August 19, 2024

Insights into the accuracy of social scientists’ forecasts of societal change

Grossmann I., Rotella A., Hutcherson C. et al., Nature Human Behaviour 2023 Vol. 7 No. 4 P. 484–501

How well can social scientists predict societal change, and what processes underlie their predictions? To answer these questions, we ran two forecasting tournaments testing the accuracy of predictions of societal change in domains commonly studied in the social sciences: ideological preferences, political polarization, life satisfaction, sentiment on social media, and gender–career and racial bias. After we provided them with historical trend ...

Added: June 7, 2023

Universal almost optimal compression and Slepian-Wolf coding in probabilistic polynomial time

Bauwens B. F., Zimand M., Journal of the ACM 2023 Vol. 70 No. 2 Article 9

In a lossless compression system with target lengths, a compressor 𝒞 maps an integer m and a binary string x to an m-bit code p, and if m is sufficiently large, a decompressor 𝒟 reconstructs x from p. We call a pair (m,x) achievable for (𝒞,𝒟) if this reconstruction is successful. We introduce the notion ...

Added: March 22, 2023

Inequalities for space-bounded Kolmogorov complexity

Bauwens B. F., Gács P., Romashchenko A. et al., Computability 2022 Vol. 11 No. 3-4 P. 165–185

Finding all linear inequalities for entropies remains an important open question in information theory. For a long time the only known inequalities for entropies of tuples of random variables were Shannon (submodularity) inequalities. Only in 1998 Zhang and Yeung 1998 found the first inequality that cannot be represented as a convex combination of Shannon inequalities, and ...

Added: December 23, 2022

Information disclosure in the framework of kolmogorov complexity

Vereshchagin N., Theoretical Computer Science 2023 Vol. 940 P. 108–122

We consider the network consisting of three nodes 1, 2, 3 connected by two open channels 1 → 2 and 1 → 3. The information present in the node 1 consists of four strings x , y , z , w. The nodes 2, 3 know x , w and need to know y , z, respectively. ...

Added: December 19, 2022

TECHNIQUES IN DEVELOPING LISTENING SKILLS WHEN TEACHING INTERPRETERS

Krivoshlykova L., Pushkina A., Ryabkova V., , in: EDULEARN21 Proceedings 13th International Conference on Education and New Learning Technologies July 5th-6th, 2021.: IATED, 2021. P. 1076–1081.

Listening training plays an important role in teaching interpreters. In this regard, educators and psychologists consider the listening stage in the process of translation to be a complex receptive mental and mnemonic activity, which is associated not only with the perception of speech messages, but also with their comprehension and analysis. The aim of the ...

Added: November 14, 2022

Prediction of Wildfires Based on the Spatio-Temporal Variability of Fire Danger Factors

Gizatullin A., Alekseenko N., Geography, Environment, Sustainability 2022 Vol. 15 No. 2 P. 31–37

Most methods in the field of wildfire prevention are based on expert assessment of fire danger factors. However, their weights are usually assumed constant for the entire application area despite the geographical and seasonal changes of factors. This study aimed to develop a wildfire prevention method based on partial and general fire danger ratings taking into account their ...

Added: October 24, 2022

Can Heritage Speakers Predict Lexical and Morphosyntactic Information in Reading?

Parshina O., Lopukhina A., Sekerina I. A., Languages 2022 Vol. 7 No. 1 Article 60

Ample evidence suggests that monolingual adults can successfully generate lexical and morphosyntactic predictions in reading and that correct predictions facilitate sentence comprehension. In this eye-tracking corpus reading study, we investigate whether the same is true for reading in heritage language. Specifically, we ask whether heritage speakers (HSs) of Russian are able to anticipate lexical and/or morphosyntactic information of the ...

Added: March 16, 2022

Implicit auditory perception of local and global irregularities in passive listening condition

Liaukovich K., Ukraintseva Y., Martynova O., Neuropsychologia 2022 Vol. 165 Article 108129

The auditory system detects differences in sounds at an implicit level, but data on this difference might not be sufficient to make explicit discrimination. The biomarkers of implicit auditory memory of ambiguous stimuli could shed light on unconscious auditory processing and implicit auditory learning. Mismatch negativity (MMN) and P3a, components of event-related potentials (ERPs) reflecting stimuli discrimination without direct ...

Added: February 28, 2022

«Всякий, кто мимо идет с лопатою, ныне объект внимания», или к вопросу о когнитивном старении

Falikman M., Петербургский театральный журнал 2021 № 3 С. 14–19

В статье представлен краткий обзор современных исследований так называемого "когнитивного старения": возрастных изменений рабочей памяти, внимания, ментального лексикона, автобиографической памяти личности, а также возможностей компенсации возрастного снижения когнитивных функций. ...

Added: November 4, 2021

Predictions and Algorithmic Statistics for Infinite Sequences

Milovanov A., , in: Computer Science – Theory and Applications: 16th International Computer Science Symposium in Russia, CSR 2021, Sochi, Russia, June 28–July 2, 2021, Proceedings.: Springer, 2021. Ch. 17 P. 283–295.

We combine Solomonoff’s approach to universal prediction with algorithmic statistics and suggest to use the computable measure that provides the best “explanation” for the observed data (in the sense of algorithmic statistics) for prediction. In this way we keep the expected sum of squares of prediction errors bounded (as it was for the Solomonoff’s predictor) ...

Added: August 11, 2021

Predictions of Chalcospinels with Composition ABCX4 (X = S or Se)

Kiselyova N. N., Dudarev V., Ryazanov V. V. et al., Inorganic Materials: Applied Research 2021 Vol. 12 No. 2 P. 328–336

New chalcospinels of the most common compositions were predicted: AIBIIICIVX4 (X = S or Se) and AIIBIIICIIIS4 (A, B, and C are various chemical elements). They are promising for the search for new materials for magneto-optical memory elements, sensors, and anodes in sodium-ion batteries. The parameter “a” values of their crystal lattice are estimated. When predicting, only the ...

Added: April 2, 2021

The normalized algorithmic information distance can not be approximated

Bauwens B. F., Blinnikov I., , in: Computer Science – Theory and Applications 15th International Computer Science Symposium in Russia, CSR 2020, Yekaterinburg, Russia, June 29 – July 3, 2020, ProceedingsVol. 12159.: Springer, 2020. P. 130–141.

It is known that the normalized algorithmic information distance is not computable and not semicomputable. We show that for all 𝜀<1/2, there exist no semicomputable functions that differ from N by at most 𝜀. Moreover, for any computable function f such that |lim𝑡𝑓(𝑥,𝑦,𝑡)−N(𝑥,𝑦)|≤𝜀 and for all n, there exist strings x, y of length n such that ...

Added: February 5, 2021

Estimating educational outcomes from students’ short texts on social media

Smirnov I., EPJ Data Science 2020 Vol. 9 No. 1 P. 27

Digital traces have become an essential source of data in social sciences because they provide new insights into human behavior and allow studies to be conducted on a larger scale. One particular area of interest is the estimation of various users’ characteristics from their texts on social media. Although it has been established that basic ...

Added: October 14, 2020

Information Distance Revisited

Bauwens B. F., , in: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)Vol. 154: Leibniz International Proceedings in Informatics (LIPIcs).: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2020. P. 46:1–46:14.

Added: March 20, 2020

Descriptive complexity of computable sequences revisited

Vereshchagin N., Theoretical Computer Science 2020 Vol. 809 P. 531–537

The purpose of this paper is to answer two questions left open in [B. Durand, A. Shen, and N. Vereshchagin, Descriptive Complexity of Computable Sequences, Theoretical Computer Science 171 (2001), pp. 47--58]. Namely, we consider the following two complexities of an infinite computable 0-1-sequence $\alpha$: $C^{0'}(\alpha )$, defined as ...

Added: January 17, 2020

Прогнозирование психологических характеристик человека на основании его цифровых следов

Латынов В. В., Ovsyannikova V. V., Психология. Журнал Высшей школы экономики 2020 Т. 17 № 1 С. 166–180

The article discusses the prediction of individual psychological characteristics (personality traits, emotional states, values, motives, etc.) based on person’s digital footprints. As studies have shown, such characteristics can be very accurately detected on the basis of various types of digital footprints: texts, images, Internet-surfing features, the nature and duration of phone calls, “likes” (I like), financial transactions, and changes ...

Added: November 23, 2019