• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book chapter

Сравнение корпусов мерой χ²: символы, слова, леммы или частеречные пометы?

С. 282-286.

This paper discusses what kind of linguistic units is best suited for comparing corpora using χ². Based on subcorpora from the British National Corpus, I compare the performance of this measure when using 1-, 2-, and 3-ngrams of characters, words, lemmata, and POS tags of different types. The experiment shows that this measure fares best when using character 2-grams.

In book

СПб.: Издательство СПбГУ, 2017.