Entropic Moments and Domains of Attraction on Countable Alphabets
Modern information theory is largely developed in connection with random elements residing in large, complex, and discrete data spaces, or alphabets. Lacking natural metrization and hence moments, the associated probability and statistics theory must rely on information measures in the form of various entropies, for example, Shannon’s entropy, mutual information and Kullback–Leibler divergence, which are functions of an entropic basis in the form of a sequence of entropic moments of varying order. The entropicmoments collectively characterize the underlying probability distribution on the alphabet, and hence provide an opportunity to develop statistical procedures for their estimation. As such statistical development becomes an increasingly important line of research in modern data science, the relationship between the underlying distribution and the asymptotic behavior of the entropic moments, as the order increases, becomes a technical issue of fundamental importance. This paper offers a general methodology to capture the relationship between the rates of divergence of the entropic moments and the types of underlying distributions, for a special class of distributions. As an application of the established results, it is demonstrated that the asymptotic normality of the remarkable Turing’s formula for missing probabilities holds under distributions with much thinner tails than those previously known.