Распределённые представления редких слов русского языка, учитывающие векторы однокоренных слов
The paper proposes algorithms that perform automatic morphemic analysis of words and methods of distributed representations of words that indirectly use information about the morphemic composition through the averaging of vectors of same-root words. Morphemic analysis models for the Russian language are evaluated on samples of common and rare words. Several methods are proposed for obtaining distributed representations of rare words based on word2vec representations of same-root words. Our experiments have shown that on the problem of determining the semantic proximity of a pair of words, the proposed methods yield results that are comparable to the results of the fastText model or surpass them.