The Tomsk Dialect Corpus: a comprehensively annotated database of a Siberian Russian dialect from material collected over the last 70 years

S. Zemicheva; Gromov M.; Dubtsova L.; Ugryumova M.; Vasilchenko A.; Zyuz’kova N.

doi:10.1007/s11185-023-09277-w

Publications

?

The Tomsk Dialect Corpus: a comprehensively annotated database of a Siberian Russian dialect from material collected over the last 70 years

Russian linguistics. 2023. No. 47. P. 231–252.

Zemicheva S., Gromov M., Dubtsova L., Ugryumova M., Vasilchenko A., Zyuz’kova N.

The paper offers the first full description of the Tomsk Dialect Corpus – an electronic resource based on recordings of the Russian dialect speech of the Tomsk and Kemerovo regions (West Siberia), which has been collected since 1946. The corpus counts 3,350,272 tokens, which makes it the largest electronic collection of dialect speech in Russia. The originality of this resource consists in the uniqueness of the materials collected and their multifaceted annotation. Topic and pragmatic annotations were created manually. Topic annotation is available for the whole data, whereas pragmatic annotation is available for 45,445 speech acts. Grammatical annotation was performed automatically with the PhpMorphy parser, with additional manual correction for some dialect words. Metalinguistic annotation includes the recording’s year and place, and the speakers’ age, gender, and educational level. All annotated parameters are searchable. The corpus also includes a lexicographic component, i.e. definitions of dialect lexemes.

Research target: Philology and Linguistics

Language: English

DOI

Text on another site

Keywords: speech acts dialect corpus topic annotation pragmatic annotation Russian dialects of Siberia

Publication based on the results of:

Systematic language convergence in a typological perspective (2023)

THE SEMIOTIC INTENSITY APPROACH: A SCOPING REVIEW OF AMPLIFICATION AND ATTENUATION MECHANISMS IN MULTIMODAL MEDIA DISCOURSE

Yin Z., Terra Linguistica 2026 Vol. 17 No. 2 P. 152–168

Abstract. In the context of global communication, the construction of national images in the media has evolved from passive reporting to active meaning modulation. Using China as a case study, this research introduces the Semiotic Intensity Approach (SIA) to quantify how news media integrate verbal, visual, and layout resources to either amplify or attenuate specific ...

Added: July 8, 2026

Комитет цензуры иностранной как институт культурного трансфера, или судьба итальянских книг и переводов с итальянского в цензурных документах 1830–1850-х годов

Bodrova A. S., Guskov S., Studi Slavistici 2026 Т. 23 № 1 С. 197–212

The article investigates foreign censorship as an institution of cultural transfer in the Russian Empire and its impact on the reception of Italian literature between the 1830s and 1850s. Drawing on archival materials, the authors demonstrate that censorship decisions were determined not only by the norms of the Censorship Statute (1828) but also by a ...

Added: July 5, 2026

Деепричастия в русском языке XVIIв.: переходный период в истории формирования их грамматического значения

Ermolova M., Russian Linguistics 2026 Т. 50 Статья 14

The article analyzes the functioning of gerunds in the Russian language of the 17th century. Basedon the analysis of contexts that are absent in modernRussian, itisconcludedthatinthe 17th century the gerund lost the absolute temporal meaning it once had, acquiring a relative meaning depending on the tense of the main predicate, while remaining, at the same ...

Added: July 4, 2026

Семантика необратимости в медиадискурсе ФРГ: эсхатологические коды и реакция аудитории в условиях кризиса

Moskvina Z. О., Вестник Российского университета дружбы народов. Серия: Литературоведение, журналистика 2026 Т. 31 № 2 С. 398–408

Abstract. This article explores the semantic and cognitive mechanisms governing the functioning of the lexeme “irreversibility” (Unumkehrbarkeit) within contemporary German media discourse covering the crisis in German-Russian relations. The study tests the hypothesis that the use of irreversibility semantics in the mass media serves as a rhetorical strategy intended to reinforce the perception of ongoing ...

Added: July 3, 2026

Men and women are from the same planet Gender similarities in perspective-taking abilities

Imbault C., Slioussar N., Ivanenko A. et al., The Mental Lexicon 2026 P. 1–23

The study examines emotional responses to words representing a wide range of psychological valence and focuses on gender-related differences. We aimed to find out whether men and women differ in their emotional responses, and whether they can take the perspective of another gender. We used the slider paradigm (Warriner et al., 2017): participants saw a humanoid ...

Added: July 2, 2026

Об одном из путей грамматикализации страдательных причастий прошедшего времени в славянских языках (на материале польского и русского языков)

Ermolova M., Вопросы языкознания 2026 № 4 С. 73–85

В статье сопоставляются процессы эволюции страдательного причастия прошедшего времени (СППВ) в неопределенно-личную финитную форму прошедшего времени в польском и старорусском языках. Рассматриваются типы контекстов СППВ, зафиксированные в истории русского языка, и типы польских употреблений с СППВ, которые связаны с формированием неопределенно-личной формы на -no/-to. В результате анализа материала обоих языков можно сделать вывод о том, ...

Added: July 2, 2026

ПИНДАР. ПИФИЙСКАЯ ОДА 9.33–43: О ЧЕМ ГОВОРИТ ХИРОН?

Akhunova O., Индоевропейское языкознание и классическая филология 2026 Т. 30 № 1 С. 108–119

There is a scene in Pindar’s Pythian 9 that attracts much attention of scholars, not only because the erotic theme in general is unusual for Pindar, but also because in this scene neither the question that Apollo addresses Chiron, nor the answer that Chiron gives him, can be unambiguously interpreted. Does Apollo intend to commit open violence against Cyrene, or ...

Added: July 1, 2026

Concepts of searching and finding: principles of colexification in a typological perspective

Reznikova T., Rakhilina E. V., Ryzhova D. et al., Lingua 2026 Vol. 341

The article examines lexification of the semantic domains of searching and finding based on a sample of 25+ languages. First, it discusses the semantic parameters underlying lexical oppositions within each of the domains (e.g., type of the subject and referentiality of the object, for searching; intentionality and animacy of the object, for finding). Second, it ...

Added: July 1, 2026

Language policy in multiethnic countries: Current trends

Bergelson M., Grenoble L., Russian Journal of Linguistics 2026 Vol. 30 No. 2 P. 275–309

This introductory article surveys current theoretical and methodological trends in language policy research in multilingual and multiethnic societies, with particular attention to the post-Soviet space and the Russian Federation. Drawing on structural, critical, ecological, and urban sociolinguistic approaches, the paper traces the evolution of language policy scholarship from early language planning models to contemporary frameworks emphasizing multilingualism, globalization, social inequality, ...

Added: June 30, 2026

LANGUAGE POLICY IN MULTIETHNIC COUNTRIES

-, 2026.

The papers in this thematic volume demonstrate that language policy in the post-Soviet space and elsewhere reveals a fundamental tension that mirrors global shifts: the conflict between state efforts to manage national identity and the organic reality of human communication. While regional nationalization efforts often demonstrate global patterns of securitization, the actual practices of speakers tell a different story. Language policy ...

Added: June 30, 2026

ПРОДАННЫЙ ПРАЗДНИК, УКРАДЕННАЯ ЧАСОВНЯ, ПРОИГРАННЫЙ ПРИХОД: ДЕРЕВЕНСКИЙ ПРАЗДНИК КАК СИМВОЛИЧЕСКИЙ КАПИТАЛ

Moroz A., Антропологический форум 2026 Т. 69 С. 296–324

Some rather unusual stories have been recorded from time to time in various Russian regions: about one village that sold its holiday to another, about the residents of one village who stole a chapel from another one and transported it to their own village, or how a rural priest gambled away part of his parish ...

Added: June 30, 2026

VIII Международный научный конгресс (7–8 апреля 2023 г.) / Филология. Социальная и национальная вариативность языка и литературы : материалы VIII Международного научного конгресса Симферополь, Издательский дом КФУ им. В. И. Вернадского, 2023. ISBN: 978-5-605-02308-1

Издательский дом КФУ им. В. И. Вернадского, 2023.

В сборнике представлены статьи по докладам участников VIII Международного научного конгресса «Филология. Социальная и национальная вариативность языка и литературы», который проходил в г. Симферополь 7 – 8 апреля 2023 г. В представленных публикациях рассматриваются актуальные проблемы социолингвистики, социофонетики и фоностилистики, индоевропеистики, литературоведения, языкознания и корпусной лингвистики, коммуникативистики и прагмалингвистики, лингводидактики, библиотечного обслуживания, диалога культур и ...

Added: June 30, 2026

I Международная научно-образовательная конференция «Пейсиковские чтения: проблемы современного академического востоковедения»: материалы конференции

М.: ИСАА МГУ имени М.В. Ломоносова, 2023.

Издание представляет собой сборник материалов I Международной научно-образовательной конференция «Пейсиковские чтения: проблемы современного академического востоковедения», проведённой 21 апреля 2023 года в ИСАА МГУ имени М.В. Ломоносова. В книге представлены работы сотрудников Института и приглашённых специалистов из ряда ведущих институтов России и зарубежных стран Сборник в электронном виде можно скачать по ссылке http://iranistika.iling-ran.ru/Sbornik/ ...

Added: June 30, 2026

Великие империи Древнего Ирана: новый аутентичный мультимедийный учебный комплекс

Gromova A., Научный вестник Крыма (Россия, ISSN: 2499-9911) 2021 № 2 (31) С. 1–13

The Iranian ‘Teleschool’ that was launched in 2020 on the base of standard schoolbooks published by the Ministry of Education, reflects the common vision of the glorious history of Ancient Iran and promotes the national cultural heritage. The present article aims to describe a comprehensive selection of new learning materials such as original texts and ...

Added: June 30, 2026

Традиции Ноуруза в Даване, Иран: праздничные сладости и весенние стихи

Gromova A., Армянский гуманитарный вестник 2022 № 8 С. 267–275

The article describes the local customs of celebrating the Iranian New Year in Davan, an ancient village in the province of Fars, Iran, known for its unique landscape and archaic dialect. Some of the traditions that exist here can be attributed to all-Iranian seasonal practices, however, certain culinary traditions and sweets (for example, popcorn rice ...

Added: June 30, 2026

Становление имени: ранние этапы усвоения детьми именной морфологии русского языка.

Воейкова М. Д., Языки славянских культур, 2015.

Книга посвящена описанию начального этапа усвоения русскими детьми имен существительных, прилагательных и числительных. Именная система является основой языковой системы ребенка: известно, что имена лиц и названия предметов составляют около 90% из первых 100 слов детей, овладевающих индоевропейскими языками. Кстати, в языках другого строя (например, в корейском или в китайском) процент имен в начальном словаре может ...

Added: June 30, 2026

Литературный круг Михаила Кузмина: границы – уровни – прагматика

Pakhomova A., Quaestio Rossica 2026 Т. 14 № 2 С. 389–405

This paper examines the structural and pragmatic characteristics of the literary circle (Rus. литературный круг), a form of literary cooperation that has rarely been the subject of independent analysis, particularly when compared with other forms of writers’ associations (such as clubs, salons, and groups). The main set of issues associated with the literary circle lies ...

Added: June 30, 2026

Иран и его соседи

Gromova A., М.: КноРус, 2023.

Учебное пособие по лингвострановедению предназначено для востоковедов, изучающих персидский язык в рамках различных специализаций: регионоведение, филология, история и политология, экономическое развитие стран Ближнего и Среднего Востока. Пособие знакомит с реалиями современной иранской жизни и национальными новостными ресурсами, широко использует материалы Интернета. Книга оставляет известную свободу в выборе материала для занятий в зависимости от уровня владения ...

Added: June 29, 2026

О генезисе жанра прозаического гимна в литературе Второй софистики в кн.: ПОЭТИКА БОГООБЩЕНИЯ: МИСТИЧЕСКИЕ ХРИСТИАНСКИЕ ТЕКСТЫ ОТ ПОЗДНЕЙ АНТИЧНОСТИ ДО XX ВЕКА

Межерицкая С. И., М.: Аквилон, 2024.

Настоящее исследование посвящено изучению и описанию жанра прозаического гимна, определению его места в системе жанров эпидейктического красноречия, а также генезису и развитию в позднеантичной риторической традиции. Оба вопроса — природа и становление данного жанра — тесно взаимосвязаны. Так, с одной стороны, полная характеристика прозаического гимна возможна только при условии его сопоставления с гимном поэтическим — древнейшим жанром древнегреческой хоровой ...

Added: June 29, 2026

Tradition and innovation in ancient Greek oratory of the Roman Empire: History of the problem

Межерицкая С. И., Scrinium: Journal of Patrology and Critical Hagiography 2022 Vol. 18 P. 453–468

This article presents a review of research literature on the so-called Second Sophistic (late first – early third centuries CE), that marked the flowering of ancient Greek oratory and had a powerful influence on the beginning of the Christian eloquence. The scholars’ interest in this topic increased in the second half of the 19th century due to insufficient ...

Added: June 29, 2026

Применение больших языковых моделей для анализа ценностно-патриотического дискурса русскоязычных пользователей

Balakina Y. V., Grigoreva M., Соколова Е. Н., Вестник Российского фонда фундаментальных исследований. Гуманитарные и общественные науки 2025 Т. 123 № 4 С. 56–69

The article examines the potential of large language models (LLMs) for automated analysis of value-laden and patriotic discourse in Russian-language social media. Using a corpus of posts from VK, Odnoklassniki and Telegram (2023–2025), it investigates the extent to which automatic coding results align with expert annotation based on a specially developed categorical scheme. The codebook ...

Added: November 26, 2025

Иллокутивный потенциал глаголов как инструментов медиафрейминга в дискурсе СМИ в период пандемии COVID-19

Radina N., Balakina Y. V., Bannikov K., Вестник Томского государственного университета 2025 № 510 С. 52–62

The timeliness of the presented research is justified by the significant role of the media during a crisis, since the growth of content consumption increases the ability to influence the audience in order to form certain attitudes and change behavioral patterns. In addition, despite the fact that issues of media framing as one of the ...

Added: May 8, 2025

Towards a typology of echo questions

Симонова Т. В., Voprosy Jazykoznanija 2025 No. 3 P. 7–29

Over the past half century, the analysis of questions has played an important role in the development of the syntactic theory. Despite this, echo questions have been given quite little attention, and most studies on echo questions focused on data from a single language. In this paper, I review strategies of forming echo questions of different types in ...

Added: May 7, 2025

Анализ проблемы скептицизма в отношении значения в категориях языковой прагматики

Smirnov M., Вестник Томского государственного университета. Философия. Социология. Политология 2025 № 84 С. 23–33

In this work, I show the perspectives revealed for analysis of the problem of scepticism regarding meaning (‘the Kripke’s problem’) by usage of the categories of linguistic pragmatics (performativity, the distinction of locution, illocution and perlocution). I provide a critical analysis of the argumentation against scepticism proposed by V. A. Ladov and E. V. Borisov, who appeal to performative ...

Added: November 21, 2024