• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 30, 2026
HSE Researchers Compile Scientific Database for Studying Childrens Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
April 30, 2026
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling

P. 433–443.
Богданова-Бегларян Н. В., Sherstinova T., Blinova O. V., Мартыненко Г. Я.

Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i. e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of the PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.

Language: English
Full text
DOI
Keywords: statisticspragmaticssociolinguisticsNLPspeech corpusfrequency listsspoken Russianeveryday discoursepragmatic markers

In book

Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings
Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings
Vol. 11658. , Switzerland: Springer, 2019.
Similar publications
Паратекст о паратексте
Kasatkina A., Сергеев М. Л., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2025 Т. 21.3 С. 13–25
This article introduces a collection of publications selected from the Proceedings of the conference “Circum Text: Para, Meta-, and Other Marginalia” (Institute for Linguistic Studies RAS, St. Petersburg, October 19–21, 2023). It describes the general agenda of paratextual studies and aligns the selected articles with its various aspects. Paratext is a variety of verbal and ...
Added: March 25, 2026
Granular computing-based deep learning for text classification
Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746
Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...
Added: March 12, 2026
Коммуникативная концепция Т. Г. Винокур в контексте прагматической социологии (на примере пьесы Д. Данилова «Сережа очень тупой»)
Nikishina E., В кн.: Говорящий и пишущий: К 100-летию со дня рождения Татьяны Григорьевны Винокур.: М.: Институт русского языка им. В.В. Виноградова РАН, 2024. С. 238–258.
The book is dedicated to the memory of a remarkable Russian language scholar, Tatyana Grigoryevna Vinokur (1924–1992). The range of issues addressed in the collected scholarly articles reflects the breadth of Tatyana Grigoryevna's research interests: the history of language, poetics, the language of fiction, stylistics, speech culture, problems of communication studies, and many other topics. ...
Added: March 8, 2026
Youth slang as a social language code: funktion and formation
Trofimova N., Pesina S., Vinogradova S. et al., Brazilian Journal of Education, Technology and Society - BRAJETS 2025
The article presents numerous functions and special features of slang, which is able to represent a variety of communicative intentions of the speakers. In addition to the desire of communicants to maintain the confidentiality of communication through slang, the latter is used as a means of language economy, as a mechanism for creating new concepts, ...
Added: February 22, 2026
Языковая ситуация у македонцев, чехов и словаков в Воеводине (по материалам экспедиции 2023 г.)
Борисов С. А., Кикило Н.И., Немчинов В. А., Славяноведение 2024 № 6 С. 85–99
The article provides an overview of field research among representatives of the Macedonian, Czech and Slovak minority communities in the South Banat and South Bačka districts of the Autonomous Province of Vojvodina (Serbia), conducted in 2023. The Macedonian minority is made up of descendants of settlers who came to Vojvodina in 1945– 1948 The article provides a brief analysis ...
Added: February 18, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Explorations in Applied Ethnolinguistics: Words, Cultures, and Global Perspectives
Palgrave Macmillan, 2025.
This volume contributes to the growing body of cutting-edge research into the Natural Semantic Metalanguage (NSM) approach in linguistics. It explores the broad range of possible applications enabled by the NSM approach, from linguistic studies of semantics and culture to cross-cultural studies, psychology and childhood education. The volume builds on previous studies, bringing a diversity ...
Added: January 28, 2026
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
INCOMA Ltd, 2021.
Added: January 28, 2026
Statistically distinguishable rating scales
Pomazanov M. V., The Journal of Risk Model Validation 2026 Vol. 20 No. 1 P. 1–24
This paper proposes a method of designing a statistically distinguishable rating scale that is not excessive in relation to the existing observation statistics. This allows for more stable validation with a fixed maximum number of violations of the Wald criterion compared with the excess scales usually used by banks. The increased validation robustness will reduce the calibration probability of ...
Added: December 9, 2025
Preposition drop in Russian spoken by Mari and Beserman bilinguals
Yakovleva A., Kosheliuk N., Moroz G., International Journal of Bilingualism 2025 P. 1–19
Aims and Research Questions: In this paper, we present a corpus-based study of preposition drop (p-drop) in the speech of Mari-Russian and Beserman-Russian bilinguals compared to the speech of Russian monolinguals. Based on data from spoken corpora, we demonstrate that the prepositions v ‘in’, k ‘to’, s ‘with’ are omitted in the speech of bilinguals ...
Added: November 26, 2025
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Association for Computational Linguistics, 2025.
Added: November 17, 2025
Мониторинг разработки и применения технологий искусственного интеллекта: основные методологические подходы
Abashkin V., Sakhno M., Abdrakhmanova G., Вопросы статистики 2025 Т. 32 № 5 С. 7–17
The article presents methodological approaches to organizing and conducting comprehensive statistical monitoring of the development and application of artificial intelligence (AI) technologies. The relevance of this topic is driven by the high significance of AI technologies for the economy and society, and their recognition as one of the leading technologies of the current decade, both globally ...
Added: November 11, 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...
Added: November 6, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
Анализ заданий и задач в учебной литературе по прикладной статистике для психологов на основе компетентностного подхода
Kolachev N., Новиков И. А., Психологические исследования: электронный научный журнал 2025 Т. 18 № 102 Статья 4
This study examines the characteristics of instructional tasks and problems in applied statistics for psychology students through the lens of a competency-based approach. To achieve the research objective, a methodological framework was developed comprising four key components: content domain, types of cognitive actions, forms of data representation, and instrumental tools. The sample consisted of statistical ...
Added: September 26, 2025
Рейтинговое исследование креативных индустрий
Vapnyarskaya O., Rivchun T., Платонова Н. А. et al., Креативные индустрии 2025 № 1 (1) / 2025 С. 37–52
This article presents the results of a survey conducted at the regional level, in order to form an integral assessment of the development of creative industries of a given region. The authors also compared the survey findings with ratings of a similar direction. The increasing interest of regional administrations in the development of regional creative ...
Added: September 12, 2025
The immediate and the naive metaphysics
Ivan B. Mikirtumov, Epistemology and Philosophy of Science 2025 Vol. 62 No. 3 P. 126–131
In this article, I discuss Pirmin Stekeler-Weithofer’s ideas about the nature of language and the metaphysical residue that seems to be present in the realm of immediate experience, despite all the criticism and success of positive knowledge. This includes, first and foremost, the ability to perceive objects, facts, and possible worlds which humans have from ...
Added: September 1, 2025
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Tartu: University of Tartu Library, 2025.
The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...
Added: July 17, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit