?
The Combinatorial Analysis of n-Gram Dictionaries, Coverage and Information Entropy based on the Web Corpus of English
Baltic Journal of Modern Computing. 2021. Vol. 9. No. 3. P. 363–376.
We research n-gram dictionaries and estimate its coverage and entropy based on the web corpus of English. We consider a method for estimating the coverage of empirically gen- erated dictionaries and an approach to address the disadvantage of low coverage. Based on the ideas of Kolmogorov’s combinatorial approach, we estimate the n-gram entropy of the English language and use mathematical extrapolation to approximate the marginal entropy. In addition, we approximate the number of all possible legal n-grams in the English language for high order of n-grams.
Приоритетные направления:
компьютерно-математическое
Язык:
английский