The Cross-evaluation Crux for Computational Phylogenetic Linguistics

?

The Cross-evaluation Crux for Computational Phylogenetic Linguistics

Afanasev I.

The flourishing of computational phylogenetic linguistics increased the pressing need for
cross-evaluation between the existing classification approaches, which are often imperfect,
whether performed by a human or a computer. We present a study of cross-evaluation
approaches for both methods (including an interdisciplinary approach to test the linguistic
findings against) and data (complementing traditional word lists by linguistic atlases, surveys
and databases). The focus of the research is on the use of insufficient cross-evaluation which
leads to misleading conclusions about methods. We perform a case study of cross-evaluation
misuse in computational phylogenetic linguistics research of South American languages
based on Levenshtein distance measurement between Swadesh list items. The conclusion
presents the prospects of language outgroup comparison implementation. It is a new possible
cross-evaluation method that joins method cross-evaluation and data cross-evaluation.

Language: English

Full text

In book

Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2022)

Springer, 2024.

The application of corpus-based language distance measurement to the diatopic variation study (on the material of the Old Novgorodian birchbark letters)

Afanasev I., Lyashevskaya O., , in: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025).: Tartu: University of Tartu Library, 2025. P. 153–164.

The paper presents a computer-assisted exploration of a set of texts, where qualitative analysis complements the linguistically-aware vector-based language distance measurements, interpreting them through close reading and thus proving or disproving their conclusions. It proposes using a method designed for small raw corpora to explore the individual, chronological, and gender-based differences within an extinct single ...

Added: July 17, 2025

Basic vocabulary of Yupik languages: a lexicostatistical analysis

Yuri B. Koryakov, Journal of Language Relationship 2024 Vol. 22 No. 3–4 P. 296–341

This article presents a lexicostatistical classification of Yupik languages included in the Eskaleut family, using 110-word lists as the basis for comparison. The study aims to refine and expand upon previous lexicostatistical work on Yupik languages, focusing on semantic clarifications and contextual considerations in compiling the word lists. The study includes new data from recent ...

Added: March 7, 2025

Afanasev I., Lyashevskaya O., , in: Structuring Lexical Data and Digitising Dictionaries: Grammatical Theory, Language Processing and Databases in Historical Linguistics.: Boston, Leiden: Brill, 2024. P. 13–35.

Added: January 7, 2025

Cipher, transform, get lost: an anti-transparent system for distance measurement in East Slavic lects

Afanasev I., Journal of Language Relationship 2023 Vol. 21 No. 3-4 P. 159–177

Recent advances in computational historical linguistics have inspired a discussion on newly implemented quantitative methods — mainly, it is about their lack of transparency, and the ways to overcome it. This paper aims to demonstrate the advantages of transparency for such tools. The study compares two types of language distance measurement systems used in classification. ...

Added: May 15, 2024

The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group

Afanasev I., , in: Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023).: Association for Computational Linguistics, 2023. P. 174–186.

The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of ...

Added: May 15, 2023

Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Association for Computational Linguistics, 2023.

These proceedings include the 23 papers presented at the 10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Both EACL and VarDial were held in Dubrovnik, Croatia, in a hybrid format, allowing participants to attend on-site or ...

Added: May 15, 2023

Функционирование родных языков в современном мире: в вопросах и ответах

Ханова А. Ф., Славина Л. Р., Каз.: Институт языка, литературы и искусства им. Г. Ибрагимова АН РТ, 2021.

Научно-популярное издание включает в себя актуальную информацию, касающуюся функционирования родных языков в современном мире, происхождения и генеалогической классификации языков, отношений языка и общества, языка и культуры, языка и этноса, изложенную в формате «вопрос-ответ». Книга предназначена для широкого круга читателей. ...

Added: January 24, 2023

Лексические инновации и классификация уральских языков

Zhivlov M., В кн.: Hämeenmaalta Jamalille: kirja Tapani Salmiselle.: Helda Open Books, 2022. С. 361–375.

Традиционная классификация уральских языков предполагает последовательное бинарное ветвление сначала на самодийские и финно-угорские языки, затем финно-угорских – на угорские и финно-пермские, затем финно-пермских – на пермские и финно-волжские, и, наконец, финно-волжских – на волжские и финно-саамские. Признание за промежуточными узлами в этом дереве генетического статуса (т.е. наличия соответствующих праязыков) предполагает, что эти праязыки должны были ...

Added: April 8, 2022

Новая схема применения автоматической классификации для анализа социально-экономических систем

Rubchinskiy A., В кн.: XVII Апрельская международная научная конференция по проблемам развития экономики и общества: в 4 кн.Кн. 4.: М.: Издательский дом НИУ ВШЭ, 2017. С. 517–526.

Изложение предложенного подхода к задачам АК, а также полученных в его рамках результатов и является целью данной работы. В первой части излагается алгоритм построения семейства классифика- ций и определения по нему сложности рассматриваемой задачи АК. Во второй части рассматривается возможность применения предложенного подхода к анализу фондовых рынков. ...

Added: December 14, 2017

FAMILY OF GRAPH DECOMPOSITIONS AND ITS APPLICATIONS TO DATA ANALYSIS

Rubchinskiy A., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2016. No. WP7/2016/09.

A new decomposition approach to complex systems analysis is suggested. The conventional approach deals with the construction of a single, “the most correct”, decomposition of the considered system. Meanwhile the suggested approach is oriented to the construction of a family of decompositions, whose properties reveal some important meaningful features of the initial system. The expedience ...

Added: October 20, 2017

Divisive-Agglomerative Algorithm and Complexity of Automatic Classification Problems

Rubchinskiy A., / NRU Higher School of Economics. Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2015. No. WP7/2015/09.

An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several partitions, starting with the given pattern matrix or dissimilarity / similarity matrix. The three-level scheme of the algorithm is suggested. The output of the procedure ...

Added: October 19, 2017

МОБИЛЬНОЕ ПРИЛОЖЕНИЕ ДЛЯ СИНХРОНИЗАЦИИ ИСПОЛЬЗОВАНИЯ АУДИО- И ТЕКСТОВЫХ ФАЙЛОВ ЭЛЕКТРОННЫХ КНИГ

Alexandrov D., Нестеркина А. О., В кн.: Вузовская наука – региону : материалы XV Всероссийской научной конференции с международным участием.: Вологда: ВоГУ, 2017. С. 84–86.

The paper is devoted to creation of Android applications "Read&Listen" for reading of the electronic text books and listening to the audiobooks. The program allows to combine these two processes and automate the switching between the two types of books. The search for matches in transcription of the audio books and e-books is carried out ...

Added: September 27, 2017

Использование вероятностного распределения над множеством классов в задаче классификации арабских диалектов

Durandin O., Zolotykh N., Хилал Н. Р. et al., Научно-технический вестник информационных технологий, механики и оптики 2017 № 1(107) С. 110–116

Subject of Research.We propose an approach for solving machine learning classification problem that uses the information about the probability distribution on the training data class label set. The algorithm is illustrated on a complex natural language processing task - classification of Arabic dialects. Method. Each object in the training set is associated with a probability distribution over ...

Added: February 8, 2017

Интеллектуализация сервисов элетронных библиотек на основе самообучаемой системы классификации контента

Kharlamov A. A., Жонин А. А., Сергиевский Н. А. et al., Вестник Московского государственного лингвистического университета. Языкознание. Междисциплинарный подход в теоретической и практической лингвистике 2013 № 1 С. 81–91

Рассмотрены тенденции в развитии цифровых библиотек и их сервисов. Показано, что основное направление развития адаптивных сервисов цифровых библиотек связано с введением персонализации, которая улучшает качество их функций за счет подстройки к интересам пользователя.Предлагается подход к автоматической классификации на основе технологии для автоматического смыслового анализа текстов TextAnalyst как основание для формирования механизма персонализации. Описывается реализация программной ...

Added: November 12, 2016