?
Language distance: the evolution of an idea
This paper examines the history of the language distance studies: the genesis of the language distance measuring concept, its development over the 19th and 20th centuries, and rapid adoption as one of the standard methods for different types of language classification during the 1990s – 2020s.
The paper outlines the short history of the language classification approaches and the methods of measuring language distance that different scholars utilised. The analysis comments on the works of R. Rask, F. de Saussure, the Neogrammarians, J. Greenberg, the Moscow School of Comparative Linguistics. The general overview is split in two parts, one dedicated to the computational dialectology and the second to the computational phylogenetic linguistics, both of which currently use measuring language distance as a crucial part of their methodology.
The paper discusses the advantages and disadvantages of the listed approaches, such as the Levenshtein distance, the perplexity-based method, and Bayesian phylogenetics. The paper argues that some of these methods are often unfairly criticised when compared to the human-made classifications. It proposes the possible strategies of enhancing the existing approaches and explores the latest emerging ones. The paper underlines the relatively poor performance of the current methods on small raw historical corpora material as the potential course for future research.