• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Language distance: the evolution of an idea
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.
May 15, 2026
‘What Matters Is Not What You Study, but Who You Study with
Katerina Koloskova began studying Arabic expecting to give it up after a year—now she cannot imagine her life without it. In an interview for the Young Scientists of HSE University project, she spoke about two translated books, an expedition to Socotra, and her love for Bethlehem.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Language distance: the evolution of an idea

Dialectologia. 2025. No. 37.
Afanasev I.
In press

This paper examines the history of the language distance studies: the genesis of the language distance measuring concept, its development over the 19th and 20th centuries, and rapid adoption as one of the standard methods for different types of language classification during the 1990s – 2020s.
The paper outlines the short history of the language classification approaches and the methods of measuring language distance that different scholars utilised. The analysis comments on the works of R. Rask, F. de Saussure, the Neogrammarians, J. Greenberg, the Moscow School of Comparative Linguistics. The general overview is split in two parts, one dedicated to the computational dialectology and the second to the computational phylogenetic linguistics, both of which currently use measuring language distance as a crucial part of their methodology.
The paper discusses the advantages and disadvantages of the listed approaches, such as the Levenshtein distance, the perplexity-based method, and Bayesian phylogenetics. The paper argues that some of these methods are often unfairly criticised when compared to the human-made classifications. It proposes the possible strategies of enhancing the existing approaches and explores the latest emerging ones. The paper underlines the relatively poor performance of the current methods on small raw historical corpora material as the potential course for future research.

Research target: Philology and Linguistics
Language: English
Full text
Keywords: компьютерная лингвистикадиалектологияклассификация языковсравнительно-историческое языкознаниеhistorical-comparative linguisticslanguage classificationautomatic language distance measurementcomputational phylogenetic linguisticscomputational dialectologyязыковое расстояние
Similar publications
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 № 32(1) С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
«Плоский мир» Т. Пратчетта глазами русскоязычного фандома
Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173
Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...
Added: May 10, 2026
Вселенная Достоевского
Pershkina A., М.: Альпина нон-фикшн, 2026.
Филолог Анастасия Першкина рассказывает о том, как писатель создавал свой мир, кем его населил, какие законы установил и почему этот мир так ярко действует на нас. Кроме того, вы узнаете, кто помогал Федору Михайловичу работать, как писатель связывал между собой произведения, что думали о его текстах современники и что же такое достоевщина. ...
Added: May 6, 2026
The hypothesis of dependence of the lexical nature of mixed languages on the patterns of their emergence
Gridneva E., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2026 No. 100 P. 38–52
This study investigates mixed languages, with a specific focus on their lexical characteristics. It proposes and substantiates the hypothesis that the degree of lexical mixing in such languages — reflected in the prevalence of doublets and the distribution of vocabulary between source languages — is linked to the specific pattern of their emergence, rather than ...
Added: May 6, 2026
Арест писателя Гюнтера Хофе на франкфуртской книжной ярмарке в 1963 г.: конкурирующие образы в медийном пространстве ГДР и ФРГ
Керимов Р. Э., Новое прошлое 2026 № 1 С. 148–162
The arrest of East German writer and publishing director Günter Hofé at the 1963 Frankfurt Book Fair became a unique episode of ideological confrontation between East and West Germany. Hofé is primarily known for his documentary-fiction trilogy about World War II, in which he actively participated as a Wehrmacht soldier. The analysis of the writer’s ...
Added: May 5, 2026
Семантический ореол сакрального в четырехстопном амфибрахии: механизмы культурной памяти в поэзии Ольги Седаковой
Максимов И. В., Новый филологический вестник 2025 Т. 73 № 2 С. 187–196
The majority of studies on the metrical aspects of Olga Sedakova’s poetry focus on the formal elements of versification, rarely exploring the substantive possibilities of the chosen metres. This paper fills this gap by analyzing the unified narrative of the four-foot amphibrach, tracing its development in Russian poetry from V.A. Zhukovsky to O.A. Sedakova. At ...
Added: May 5, 2026
Кубанская стела (Musée des Beaux Arts Grenoble, Collection égyptienne, inv. 1937, 1969, 3565)
Крол А. А., Кузнецов Д. А., Ladynin I. A., Восток. Афро-азиатские общества: история и современность 2026 Т. 1 С. 244–261
The publication presents a new translation and commentary of the Quban Stela of Ramesses II (Musée des beaux-arts Grenoble, Collection égyptienne, inv. 1937, 1969, 3565). This monument dates to the beginning of his reign (ca. 1287 BC); it was found near the ruins of the fortress of Baki, close to the Nubian village of Kuban. The composers of the ...
Added: May 5, 2026
Царь Рамсес и Бактрия. Об одном мотиве позднеегипетского историописания
Ladynin I. A., Вестник древней истории 2024 Т. 84 № 1 С. 5–26
The article analyses a set of Classical evidence reflecting the Egyptian conquest of Bactria or its attempt (Diod. I. 46–47; Tac. Ann. II. 60. 3; Strabo XVII. 1. 46), a statement of Manetho of Sebennytos on the vast conquests of king Sethos-Ramesses (I) (Manetho. Frg. 50 = Ios. C.Ap. I. 15. § 98–102), and the ...
Added: May 5, 2026
Цикл И. Бабеля «Великая Криница»: темпоральная структура в свете модерна.
Гендлина В. В., Новый филологический вестник 2025 № 1 С. 144–154
В статье анализируются две новеллы Исаака Бабеля начала 1930-х гг. о коллективизации -- «Гапа Гужва» и «Колывушка». Новеллы должны были стать частью цикла о коллективизации под общим названием «Великая Криница», однако замысел книги о преобразованиях в советской деревне оказался невоплощенным. В обеих новеллах Бабель показывает грандиозный проект модернизации колхозов как процесс, разрушающий существующий порядок и жизнь отдельно ...
Added: May 4, 2026
Образцы говора македонских переселенцев в Южном Банате Республики Сербии, сёла Качарево и Глогонь, община Панчево
Muravleva N., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. С. 426–441.
В статье публикуются нарративы на македонском языке, записанные во время экспедиции 2023 года (Борисов, Кикило, Немчинов 2024) у ин формантов — представителей македонского меньшинства, проживаю щих в сёлах Качарево и Глогонь (серб. Kačarevo, Glogonj) общины Пан чево, Воеводина, Республика Сербия. В диалектных текстах отражены контактные явления, возникшие под влиянием мажоритарного сербского языка, а также смешение ...
Added: February 18, 2026
Претериальные формы в идиоме македонских переселенцев Воеводины (Сербия)
Muravleva N., Славянский мир в третьем тысячелетии 2025 Т. 20 № 3-4 С. 144–172
The article examines the features of the past tense system in Macedonian resettlement dialects of the Autonomous Province of Vojvodina, Serbia, based on a corpus of texts collected during a 2023 linguistic expedition to the villages of Jabuka, Kačarevo, Glogonj, Plandište, and Belgrade. The first section provides a sociolinguistic overview of the formation of the ...
Added: February 18, 2026
Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста
П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.
В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...
Added: January 30, 2026
Диалектные различия между востоком и западом на материале данных Диалектологического атласа русского языка: результаты многомерного шкалирования
Марченко И. А., Ronko R., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. Гл. 5 С. 236–260.
This paper presents a classification of Russian dialects based on data from the Dialectological Atlas of the Russian Language, using the method of multidimensional scaling. The main outcome of the study is a map of the Russian dialectal space, which identifies six zones (three western and three eastern) and corresponding sets of dialectal features. The ...
Added: December 7, 2025
Дискурсивные возможности больших языковых моделей при решении задач генерации новых текстов
Mylnikova A., Гасимов А. Р., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 9 С. 33–38
На основе изучения функционирования больших языковых моделей (LLMs) и специфических характеристик машинной обработки дискурса показано применение экспериментального метода компьютерного и лингвистического анализа для статистического исследования и интерпретации лингвистических характеристик текстов. В качестве материалов исследования использован лингвистический корпус текстов Brown, а также корпуса искусственно сгенерированных текстов с применением Claude Sonnet 3.7 и Grok-3. В механизмах обработки ...
Added: November 19, 2025
Диалектометрический подход к диалектной классификации восточнославянских языков на материале сборника «Восточнославянские изоглоссы»
Manusov A. V., Кузьмина А. С., Вопросы языкового родства 2024 № 22/3-4 С. 342–366
The article proposes a new dialectometric approach to the division of East Slavic languages. Our dialectometry is based on the material from the collection of articles “Vostochnoslavyanskie izoglossy” (“East Slavic isoglosses”, 1995–2006), which is a generalization of data from atlases of East Slavic languages (Dialectological atlas of the Russian language, Dialectological atlas of the Belarusian ...
Added: November 13, 2025
Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”
Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223
Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...
Added: October 19, 2025
Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)
[б.и.], 2025.
This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...
Added: October 19, 2025
The application of corpus-based language distance measurement to the diatopic variation study (on the material of the Old Novgorodian birchbark letters)
Afanasev I., Lyashevskaya O., , in: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025).: Tartu: University of Tartu Library, 2025. P. 153–164.
The paper presents a computer-assisted exploration of a set of texts, where qualitative analysis complements the linguistically-aware vector-based language distance measurements, interpreting them through close reading and thus proving or disproving their conclusions. It proposes using a method designed for small raw corpora to explore the individual, chronological, and gender-based differences within an extinct single ...
Added: July 17, 2025
Basic vocabulary of Yupik languages: a lexicostatistical analysis
Yuri B. Koryakov, Journal of Language Relationship 2024 Vol. 22 No. 3–4 P. 296–341
This article presents a lexicostatistical classification of Yupik languages included in the Eskaleut family, using 110-word lists as the basis for comparison. The study aims to refine and expand upon previous lexicostatistical work on Yupik languages, focusing on semantic clarifications and contextual considerations in compiling the word lists. The study includes new data from recent ...
Added: March 7, 2025
Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов
Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164
The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...
Added: January 18, 2025
Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)
Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54
The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...
Added: November 8, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit