Studying language evolution in the age of big data

Bhattacharya T.; Retzlaff N.; Blasi D.; Croft W.; Cysouw M.; Hruschka D.; Maddieson I.; Muller L.; Smith E.; Stadler P.; Starostin George; Youn H.

doi:10.1093/jole/lzy004

?

Studying language evolution in the age of big data

Journal of Language Evolution. 2018. Vol. 3. No. 2. P. 94–129.

Bhattacharya T., Retzlaff N., Blasi D., Croft W., Cysouw M., Hruschka D., Maddieson I., Muller L., Smith E., Stadler P., Starostin George, Youn H.

The increasing availability of large digital corpora of cross-linguistic data is revolutionizing many branches of linguistics. Overall, it has triggered a shift of attention from detailed questions about individual features to more global patterns amenable to rigorous, but statistical, analyses. This engenders an approach based on successive approximations where models with simplified assumptions result in frameworks that can then be systematically refined, always keeping explicit the methodological commitments and the assumed prior knowledge. Therefore, they can resolve disputes between competing frameworks quantitatively by separating the support provided by the data from the underlying assumptions. These methods, though, often appear as a ‘black box’ to traditional practitioners. In fact, the switch to a statistical view complicates comparison of the results from these newer methods with traditional understanding, sometimes leading to misinterpretation and overly broad claims. We describe here this evolving methodological shift, attributed to the advent of big, but often incomplete and poorly curated data, emphasizing the underlying similarity of the newer quantitative to the traditional comparative methods and discussing when and to what extent the former have advantages over the latter. In this review, we cover briefly both randomization tests for detecting patterns in a largely model-independent fashion and phylolinguistic methods for a more model-based analysis of these patterns. We foresee a fruitful division of labor between the ability to computationally process large volumes of data and the trained linguistic insight identifying worthy prior commitments and interesting hypotheses in need of comparison.

Research target: Philology and Linguistics

Priority areas: humanitarian IT and mathematics

Language: English

DOI

Text on another site

Keywords: компьютерная лингвистика computer linguistics историческое языкознание Historical Linguistics

Особенности стратегии убеждения в российском и китайском политическом дискурсе (на материале политических ток-шоу «60 минут» и «这就是中国» («Это Китай»))

Бинштейн М. М., Вестник Томского государственного университета. Филология 2026 № 99 С. 5–27

The article explores the argumentative nature of political discourse, which, according to the authors, becomes the key to the analysis of the communicativestrategy of persuasion. The aim of the research is a comparative analysis of speeches by Russian and Chinese politicians, identifying similarities and differences in the use of rhetorical devices when implementing the persuasion ...

Added: March 19, 2026

Английский язык для профессиональных целей: Когнитивная нейробиология

Zakharova A. V., Мищук А. М., M.: Флинта, 2025.

The aim of the textbook is to develop English skills and competences of biology students to a level necessary for successful oral and written communication in academic and professional spheres. The textbook materials allow for the improvement of essential language skills that are required for academic and professional communication. The textbook consists of four sections that cover the ...

Added: March 19, 2026

Невидимые машины

Тискин Д. Б., М.: Издательская группа URSS, 2026.

Предлагаемая книга представляет собой введение в формальную семантику — раздел языкознания, в котором посредством построения математически строгих моделей исследуется, как предложения приобретают значение и способность передавать информацию в зависимости от своей структуры и значений составляющих их слов. Классические теоретические идеи Г. Фреге, Д. Льюиса, Д. Каплана и др. излагаются современным языком, а при разработке нотации акцент сделан на том, ...

Added: March 17, 2026

Контрфактуальные нарративы и проблема рецептивного события в рассказе «Ручка, ножка, огуречик…» Ю. Домбровского

Shulyatieva D., Сибирский филологический журнал 2026 № 1 С. 132–143

Рассматриваются контрфактуальные событийные линии в рассказе «Ручка, ножка, огуречик…» Ю. Домбровского, их участие в создании нарративной прогрессии и в моделировании читательского опыта. Они проблематизируют зону актуальных событий в рассказе: «только возможное» берет верх над «действительно произошедшим», не позволяя читателю до конца установить, что же произошло с героем; «только возможное» вытесняет произошедшие события и из жизни ...

Added: March 16, 2026

Логика баз данных в романе «4321» (2017) П. Остера: возможности и перспективы исследования

Shulyatieva D., Практики & Интерпретации: журнал филологических, образовательных и культурных исследований 2025 Т. 10 № 4 С. 20–42

В статье рассмотрены возможности исследования логики баз данных в современной прозе на примере романа «4321» (2017) П. Остера. Логика баз данных характерна для цифровой культуры, поэтому неудивительно, что в прозе она вступает в соотношение с более привычной нарративной логикой. Логика баз данных, по Л. Мановичу, устраняет начало и конец истории, создает равенство вещей, подрывает каузальные ...

Added: March 16, 2026

Особенности употребления субъектных местоимений в ингерманландском финском языке

Budennaya E., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2024 Т. 20 № 2 С. 136–167

The article addresses the issue of use/omission of subject pronouns in Ingrian Finnish. The data comes from prose fiction by V. Valjakka and P. Mutanen, previously not used for any linguistic analysis, and from oral narratives recorded un- der the supervision of Irma Mullonen. Both fiction and oral narratives come from two different dialectal areas ...

Added: March 16, 2026

Дискурс трудоустройства в эпоху цифровых коммуникаций (на материале российских телеграм-каналов)

Gritsenko E., Аликина А. В., Вестник Томского государственного университета. Филология 2026 № 99 С. 47–76

This article analyzes the linguistic and communicative features of employment discourse in Russian Telegram channels, which represent a new dimension of professional communication in the digital age. The study emphasizes Telegram's role as a dynamic, interactive multimodal platform that differs significantly from traditional digital job search platforms such as hh.ru or Superjob.ru. The research material ...

Added: March 13, 2026

Encyclopedia of Slavic Languages and Linguistics Online

Brill, 2025.

Added: March 13, 2026

From Aksum to Lalibäla. The Myth of the “Dark Ages” of Eritrean and Ethiopian History (7th–13th centuries)

Napoli: UniorPress, 2026.

Questo volume offre al lettore una raccolta di contributi presentati a Napoli in occasione della prima edizione di NeMEES (Neapolitan Meetings of Eritrean and Ethiopian Studies) svoltasi a Marzo 2023. Esso propone una riconsiderazione critica del lungo arco cronologico compreso tra il declino di Aksum e l’ascesa della dinastia salomonide (VII–XIII secolo), tradizionalmente definito come ...

Added: March 12, 2026

Österreichische Literatur: Geschichte, Poetik, Rezeption (= Jahrbuch der Österreich-Bibliothek. Bd. 16).

Verlag «PETERBURG. XXI VEK, 2025.

Added: March 11, 2026

Introduction to the Special Issue

Islentyeva Anna, Stefanowitsch A., Zeitschrift fur Anglistik und Amerikanistik 2023 Vol. 71 No. 3 P. 215–216

The article introduces the Special Issues Languge and Gender by briefly outlining the contents of the issue. ...

Added: March 11, 2026

Особенности эволюции лексемы "profess" в английском языке

Kapustkina E., XXVII Ежегодная богословская конференция Православного Свято-Тихоновского гуманитарного университета 2017 Т. 27 С. 248–249

В докладе приведены результаты этимологического анализа ключевой лексемы "profess"и её дериватов. Подробный анализ словарных дефиниций показывает, что формирование современного смысла этой лексемы тесно связано с религией, орденом иезуитов и протестантской этикой. ...

Added: March 11, 2026

Социолингвистические маркеры образа гувернантки викторианской эпохи (на материале романов сестёр Бронте)

Kapustkina E., Вестник Самарского университета. История, педагогика, филология 2019 Т. 25 № 2 С. 95–99

В статье на материале социолингвистических маркеров портрета гувернантки рассматривается изменившееся отношение к ее положению в Англии 30-х годов XIX века, что было обусловлено повышением уровня жизни викторианцев, в частности представителей промышленной и торговой буржуазии. Известное стремление английского среднего класса во всем подражать аристократии находит свое отражение в том, что наличие гувернантки у детей викторианцев перестает быть ...

Added: March 11, 2026

Идеографическая письменность в современном мире: проблемы и перспективы

Барский К. М., Воропаев Н. Н., Домашевская Д. М. et al., Издательский дом ВКН, 2025.

Коллективная монография посвящена исследованию феномена идеографической письменности как важного культурного и лингвистического явления. В книге прослеживаются истоки и трансформация представлений об идеографическом письме, анализируются ключевые этапы научных дискуссий и современные подходы к пониманию природы иеро-глифики. Авторы показывают, как менялось восприятие китайской письменности в мировой науке и культуре, основную проблематику на современном этапе и почему этот ...

Added: March 11, 2026

“Tall, Dark and Tasty”: Masculinity in Food and Beverage Advertising

Islentyeva Anna, Zimmermann E., Zeitschrift fur Anglistik und Amerikanistik 2023 Vol. 71 No. 3 P. 265–292

This study aims to analyse the key discursive strategies employed in the representation of masculinity (and femininity) in contemporary food and beverage advertising in a sample of 35 print advertisements launched between 2000 and 2020. Food and beverages constitute utilitarian products, in contrast to hedonic products. This study analyses posters that promote products that fit ...

Added: March 11, 2026

Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста

П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.

В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...

Added: January 30, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025

Дискурсивные возможности больших языковых моделей при решении задач генерации новых текстов

Mylnikova A., Гасимов А. Р., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 9 С. 33–38

На основе изучения функционирования больших языковых моделей (LLMs) и специфических характеристик машинной обработки дискурса показано применение экспериментального метода компьютерного и лингвистического анализа для статистического исследования и интерпретации лингвистических характеристик текстов. В качестве материалов исследования использован лингвистический корпус текстов Brown, а также корпуса искусственно сгенерированных текстов с применением Claude Sonnet 3.7 и Grok-3. В механизмах обработки ...

Added: November 19, 2025

Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.

Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.

The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...

Added: November 9, 2025

Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”

Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223

Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...

Added: October 19, 2025

Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)

[б.и.], 2025.

This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...

Added: October 19, 2025