• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Book chapter

Corpus Size and the Robustness of Measures of Corpus Distance

P. 578-589.

This paper studies the impact corpus size has on the robustness of vari

-

ous frequency-based measures of corpus distance (or similarity, respec

-

tively), such as Euclidean distance, Manhattan distance, Cosine distance,

χ², Spearman’s ρ, and Simple-Maths Keyword distance. An experiment

performed using the British National Corpus shows that Euclidean distance

is least influenced by corpus size and thus is best suited for the purpose

of comparing corpora

In book

Вып. 17(24). М.: Издательский центр «Российский государственный гуманитарный университет», 2018.