A corpus-based quantitative approach to the study of morphological productivity in diachrony: The case of samo-compounds in Russian
The present paper aims at investigating the productivity of the prefixoid samo- (‘self’) in Russian compounds from a diachronic perspective. In order to verify the hypothesis that the productivity of this prefixoid has grown over time, I
consider the occurrences of samo-compounds in the Russian National Corpus, dividing the main corpus into four subcorpora, each one representing a particular time span: the 18th century, the 19th century, the 20th century and the period that lasts from the beginning of the 21st century to the present day. The approach chosen is quantitative in nature, and is based on the measure of “potential productivity” (Baayen & Lieber 1991; Baayen 1992, 1993), which is
calculated by dividing the number of hapax legomena with a certain affix by the number of tokens with that affix. This measure, however, seems inadequate for the comparison of differently-sized corpora. To overcome this problem, I resort to parametric statistical models of frequency distribution known as LNRE (Large Number of Rare Events) models (Baayen 2001). These models, which allow extrapolating the expected values of types and hapax legomena with a given affix for arbitrary values of tokens, are implemented in the package zipfR (Baroni & Evert 2014), a tool for lexical statistics in R, which is used for this study.