The Taming of the Polysemy: Automated Word Sense Frequency Estimation for Lexicographic Purposes
Although word sense frequency information is important for theoretical study of polysemy and practical purposes of lexicography, the problem of sense frequency distribution is a neglected area in linguistics. It is probably because sense frequency is not easy to estimate. In this paper we deal with the problem of automated word sense frequency estimation for Russian nouns. We developed and tested an automated system based on semantic context vectors, supplied with contexts and collocations from the Active Dictionary of Russian — a full-fledged production dictionary that reflects contemporary Russian. The study was performed on RuTenTen11 web-corpus. This allows us to reach a frequency estimation error of 11% without any additional labeled data. We compared sense frequencies obtained automatically with sense ordering in different dictionaries for several words. The method presented in this paper can be applied to any language with a sufficiently large corpus and a good dictionary that provides examples for each sense. The results may enrich language learning resources and help lexicographers order senses within a word according to frequency if needed.