Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions
In the paper vector-space semantic models based on Word2Vec word embeddings algorithm and a count-based association-oriented algorithm are evaluated and compared by measuring association strength between Russian nouns and adjectives. A dataset of nouns and associated adjectives is used as the test set for pseudodisambiguation task. Models are trained with corpora of Russian fiction. A measure of lexical association anomaly is applied evaluating similarity between the initial noun and the resulting attributive phrase. Results of association strength are reported for models characterized by different parameter values; the best parameter value combinations are proposed. The test exemplars producing the error rate are manually annotated, and the model errors are categorized in terms of their linguistic nature and compositionality features.