Коварные слова и где они обитают
Educational texts for children have two distinctly differing purposes: their readers must understand them and at the same time learn new words from them. It seems important and useful to be able to automatically detect words that may be unfamiliar to children of different ages. A challenging task is to identify words that readers perceive as familiar and understandable, but in fact understand them incorrectly. We propose a metric, called word deceptiveness, which is based on surveying and calculated as the product of the number of those respondents who mark the word as familiar by the number of those who correctly determine its meaning. We conducted a series of experiments and discovered several deceptive words in Russian. Several hypothetical mechanisms for the emergence of such words have been identified. In general, these are closeness to other, more familiar linguistic units: words, morphemes and word formation models. Future work will include an endeavor to learn to identify deceptive words on the basis of various linguistic factors.