Language distance: the evolution of an idea

?

Language distance: the evolution of an idea

Dialectologia. 2025. No. 37.

Afanasev I.

In press

This paper examines the history of the language distance studies: the genesis of the language distance measuring concept, its development over the 19th and 20th centuries, and rapid adoption as one of the standard methods for different types of language classification during the 1990s – 2020s.
The paper outlines the short history of the language classification approaches and the methods of measuring language distance that different scholars utilised. The analysis comments on the works of R. Rask, F. de Saussure, the Neogrammarians, J. Greenberg, the Moscow School of Comparative Linguistics. The general overview is split in two parts, one dedicated to the computational dialectology and the second to the computational phylogenetic linguistics, both of which currently use measuring language distance as a crucial part of their methodology.
The paper discusses the advantages and disadvantages of the listed approaches, such as the Levenshtein distance, the perplexity-based method, and Bayesian phylogenetics. The paper argues that some of these methods are often unfairly criticised when compared to the human-made classifications. It proposes the possible strategies of enhancing the existing approaches and explores the latest emerging ones. The paper underlines the relatively poor performance of the current methods on small raw historical corpora material as the potential course for future research.

Research target: Philology and Linguistics

Language: English

Full text

Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка

Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148

The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...

Added: May 14, 2026

Глаголы перемещения веществ в славянских языках

Fedorov D., Jezikoslovni Zapiski 2026 № 32(1) С. 23–52

This article describes verbs denoting motion of liquid and dry substances in Slavic languages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent grammatical phenomena such as argument ...

Added: May 13, 2026

Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе

Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249

The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...

Added: May 13, 2026

Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.

The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...

Added: May 12, 2026

«Плоский мир» Т. Пратчетта глазами русскоязычного фандома

Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173

Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...

Added: May 10, 2026

Вселенная Достоевского

Pershkina A., М.: Альпина нон-фикшн, 2026.

Филолог Анастасия Першкина рассказывает о том, как писатель создавал свой мир, кем его населил, какие законы установил и почему этот мир так ярко действует на нас. Кроме того, вы узнаете, кто помогал Федору Михайловичу работать, как писатель связывал между собой произведения, что думали о его текстах современники и что же такое достоевщина. ...

Added: May 6, 2026

The hypothesis of dependence of the lexical nature of mixed languages on the patterns of their emergence

Gridneva E., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2026 No. 100 P. 38–52

This study investigates mixed languages, with a specific focus on their lexical characteristics. It proposes and substantiates the hypothesis that the degree of lexical mixing in such languages — reflected in the prevalence of doublets and the distribution of vocabulary between source languages — is linked to the specific pattern of their emergence, rather than ...

Added: May 6, 2026

Арест писателя Гюнтера Хофе на франкфуртской книжной ярмарке в 1963 г.: конкурирующие образы в медийном пространстве ГДР и ФРГ

Керимов Р. Э., Новое прошлое 2026 № 1 С. 148–162

The arrest of East German writer and publishing director Günter Hofé at the 1963 Frankfurt Book Fair became a unique episode of ideological confrontation between East and West Germany. Hofé is primarily known for his documentary-fiction trilogy about World War II, in which he actively participated as a Wehrmacht soldier. The analysis of the writer’s ...

Added: May 5, 2026

Семантический ореол сакрального в четырехстопном амфибрахии: механизмы культурной памяти в поэзии Ольги Седаковой

Максимов И. В., Новый филологический вестник 2025 Т. 73 № 2 С. 187–196

The majority of studies on the metrical aspects of Olga Sedakova’s poetry focus on the formal elements of versification, rarely exploring the substantive possibilities of the chosen metres. This paper fills this gap by analyzing the unified narrative of the four-foot amphibrach, tracing its development in Russian poetry from V.A. Zhukovsky to O.A. Sedakova. At ...

Added: May 5, 2026

Кубанская стела (Musée des Beaux Arts Grenoble, Collection égyptienne, inv. 1937, 1969, 3565)

Крол А. А., Кузнецов Д. А., Ladynin I. A., Восток. Афро-азиатские общества: история и современность 2026 Т. 1 С. 244–261

The publication presents a new translation and commentary of the Quban Stela of Ramesses II (Musée des beaux-arts Grenoble, Collection égyptienne, inv. 1937, 1969, 3565). This monument dates to the beginning of his reign (ca. 1287 BC); it was found near the ruins of the fortress of Baki, close to the Nubian village of Kuban. The composers of the ...

Added: May 5, 2026

Царь Рамсес и Бактрия. Об одном мотиве позднеегипетского историописания

Ladynin I. A., Вестник древней истории 2024 Т. 84 № 1 С. 5–26

The article analyses a set of Classical evidence reflecting the Egyptian conquest of Bactria or its attempt (Diod. I. 46–47; Tac. Ann. II. 60. 3; Strabo XVII. 1. 46), a statement of Manetho of Sebennytos on the vast conquests of king Sethos-Ramesses (I) (Manetho. Frg. 50 = Ios. C.Ap. I. 15. § 98–102), and the ...

Added: May 5, 2026

Цикл И. Бабеля «Великая Криница»: темпоральная структура в свете модерна.

Гендлина В. В., Новый филологический вестник 2025 № 1 С. 144–154

В статье анализируются две новеллы Исаака Бабеля начала 1930-х гг. о коллективизации -- «Гапа Гужва» и «Колывушка». Новеллы должны были стать частью цикла о коллективизации под общим названием «Великая Криница», однако замысел книги о преобразованиях в советской деревне оказался невоплощенным. В обеих новеллах Бабель показывает грандиозный проект модернизации колхозов как процесс, разрушающий существующий порядок и жизнь отдельно ...

Added: May 4, 2026

Образцы говора македонских переселенцев в Южном Банате Республики Сербии, сёла Качарево и Глогонь, община Панчево

Muravleva N., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. С. 426–441.

В статье публикуются нарративы на македонском языке, записанные во время экспедиции 2023 года (Борисов, Кикило, Немчинов 2024) у ин формантов — представителей македонского меньшинства, проживаю щих в сёлах Качарево и Глогонь (серб. Kačarevo, Glogonj) общины Пан чево, Воеводина, Республика Сербия. В диалектных текстах отражены контактные явления, возникшие под влиянием мажоритарного сербского языка, а также смешение ...

Added: February 18, 2026

Претериальные формы в идиоме македонских переселенцев Воеводины (Сербия)

Muravleva N., Славянский мир в третьем тысячелетии 2025 Т. 20 № 3-4 С. 144–172

The article examines the features of the past tense system in Macedonian resettlement dialects of the Autonomous Province of Vojvodina, Serbia, based on a corpus of texts collected during a 2023 linguistic expedition to the villages of Jabuka, Kačarevo, Glogonj, Plandište, and Belgrade. The first section provides a sociolinguistic overview of the formation of the ...

Added: February 18, 2026

Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста

П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.

В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...

Added: January 30, 2026

Диалектные различия между востоком и западом на материале данных Диалектологического атласа русского языка: результаты многомерного шкалирования

Марченко И. А., Ronko R., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. Гл. 5 С. 236–260.

This paper presents a classification of Russian dialects based on data from the Dialectological Atlas of the Russian Language, using the method of multidimensional scaling. The main outcome of the study is a map of the Russian dialectal space, which identifies six zones (three western and three eastern) and corresponding sets of dialectal features. The ...

Added: December 7, 2025

Дискурсивные возможности больших языковых моделей при решении задач генерации новых текстов

Mylnikova A., Гасимов А. Р., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 9 С. 33–38

На основе изучения функционирования больших языковых моделей (LLMs) и специфических характеристик машинной обработки дискурса показано применение экспериментального метода компьютерного и лингвистического анализа для статистического исследования и интерпретации лингвистических характеристик текстов. В качестве материалов исследования использован лингвистический корпус текстов Brown, а также корпуса искусственно сгенерированных текстов с применением Claude Sonnet 3.7 и Grok-3. В механизмах обработки ...

Added: November 19, 2025

Диалектометрический подход к диалектной классификации восточнославянских языков на материале сборника «Восточнославянские изоглоссы»

Manusov A. V., Кузьмина А. С., Вопросы языкового родства 2024 № 22/3-4 С. 342–366

The article proposes a new dialectometric approach to the division of East Slavic languages. Our dialectometry is based on the material from the collection of articles “Vostochnoslavyanskie izoglossy” (“East Slavic isoglosses”, 1995–2006), which is a generalization of data from atlases of East Slavic languages (Dialectological atlas of the Russian language, Dialectological atlas of the Belarusian ...

Added: November 13, 2025

Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”

Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223

Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...

Added: October 19, 2025

Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)

[б.и.], 2025.

This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...

Added: October 19, 2025

The application of corpus-based language distance measurement to the diatopic variation study (on the material of the Old Novgorodian birchbark letters)

Afanasev I., Lyashevskaya O., , in: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025).: Tartu: University of Tartu Library, 2025. P. 153–164.

The paper presents a computer-assisted exploration of a set of texts, where qualitative analysis complements the linguistically-aware vector-based language distance measurements, interpreting them through close reading and thus proving or disproving their conclusions. It proposes using a method designed for small raw corpora to explore the individual, chronological, and gender-based differences within an extinct single ...

Added: July 17, 2025

Basic vocabulary of Yupik languages: a lexicostatistical analysis

Yuri B. Koryakov, Journal of Language Relationship 2024 Vol. 22 No. 3–4 P. 296–341

This article presents a lexicostatistical classification of Yupik languages included in the Eskaleut family, using 110-word lists as the basis for comparison. The study aims to refine and expand upon previous lexicostatistical work on Yupik languages, focusing on semantic clarifications and contextual considerations in compiling the word lists. The study includes new data from recent ...

Added: March 7, 2025

Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов

Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164

The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...

Added: January 18, 2025

Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)

Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54

The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...

Added: November 8, 2024