Данные интернета в исследовании языковых изменений: анализ чередований в русских компаративах и программа для работы с такими данными
The Internet is a unique source of non-standard forms, which gives us a novel opportunity to analyze fine-grained dynamics of language change. We used this opportunity to study the decay of historic consonant alternations in Russian. In standard Russian, these alternations are present in some verb forms and in comparatives (e.g. suhoj ‘dry’ — sushe ‘drier’, ljubit’ ‘to love’ — ljublju ‘I love’), as well as before certain derivational suffixes. Verb forms have been recently studied by Slioussar and Kholodilova (2013), and we looked at comparatives. Two groups of adjectives were selected: ones that have normative comparatives with alternations and ones that do not, but native speakers still try to generate such forms. In the first group, some adjectives like ubogij ‘poky’ have up to 30% of comparatives without alternations, but, unlike with verbs, no significant correlation with adjective frequency or its other characteristics was found. The second group consisted primarily of compound adjectives ending in -gij, -kij, -hij. Here, the most important factor is whether the second part of the compound is used as an independent adjective. If it is not (e.g. as in dlinnorukij ‘long-armed’), most comparatives lack alternations. Searching for forms on the Internet, we faced many problems. The counts provided by search engines are extremely inaccurate, only the first thousand results are shown, they cannot be downloaded in a convenient format, contain a lot of typos and other irrelevant data etc. We present a program called Lingui-Pingui that we developed to solve these and some other problems.