?
Тематическое моделирование русского рассказа 1900–1930: наиболее частотные темы и их динамика
The article describes the results of an experiment on topic modeling of Russian short stories for three successive historical periods of the early 20th century: 1) the beginning of the 20th century until 1913, 2) the warrevolutionary period (1914–1922), and 3) the early Soviet period (1923-1930). Using the Latent Dirichlet Allocation (LDA) algorithm, 9 models were built — 3 samples of different sizes (100, 500, and 1000 stories) for each of the periods. It turned out that in every model there are very frequent “themes” (topics) that characterize with a high probability a fairly significant share of texts in each sample. Moreover, one can also observe a meaningful dynamics of these frequent topics over different time periods, which allows us to consider them as thematic and stylistic markers of the analyzed text collections along with the more traditional quantitative measures of text analysis. The variety of frequent topics turned out to be higher in the second and third periods, which can be explained by the greater lexical and stylistic diversity of the prose of the “era of change”