Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts
The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be used for diachronic text research which takes into account some external factors affecting the evident language shifts. The study was conducted on the corpus of Russian Short Stories of the first third of the 20th century. This corpus aims for collecting texts written by the maximum number of Russian writers; it is designed by its developers to become a testing ground for various text computation techniques. The results of this preliminary study show the efficacy of TXM application for research on language dynamics and confirm an obvious chronological trend in the distribution of texts under study. Thus, it was shown that Russian revolution of 1917 did make significant changes in the core vocabulary of prose language understood as well as in the use of punctuation marks. However, no evident opposition was revealed at this level between the war and peace time periods. The methodology presented in this paper may be used both for diachronic studies of literature and for various NLP tasks connected with texts processing and monitoring over time with the aim of revealing linguistic, stylistic and sentiment changes in texts influenced by some external factors.