• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Everyday Conversations: a Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Everyday Conversations: a Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level

Lecture Notes in Computer Science. 2023. Vol. 14338. P. 43–56.
Sherstinova T., Михайловский Н. Э., Kolobov R.

The study examines the outcomes of automatic speech recognition (ASR) applied to field recordings of daily Russian speech. Everyday conversations, captured in real-life communicative scenarios, pose quite a complex subject for ASR. This is due to several factors: they can contain speech from a multitude of speakers, the loudness of the conversation partners’ speech signals fluctuates, there’s a substantial volume of overlapping speech from two or more speakers, and significant noise interferences can occur periodically. The presented research compares transcripts of these recordings produced by two recognition systems: the NTR Acoustic Model and OpenAI’s Whisper. These transcripts are then contrasted with expert transcription of the same recordings. The comparison of three frequency lists (the expert transcription, the acoustic model, and Whisper) reveals that each model has its unique characteristics at the lexical level. At the same time, both models perform worse in recognizing the following groups of words typical for spontaneous unprepared dialogues: discursive words, pragmatic markers, backchannel responses, interjections, conversational reduced word forms, and hesitations. These findings aim to foster improvements in ASR systems designed to transcribe conversational speech, such as work meetings and daily life dialogues.

Research target: Philology and Linguistics
Language: English
Full text
DOI
Text on another site
Keywords: vocabularyhesitationswhisperInterjectionspragmatic markersWord listsASR SystemsAcoustic ModelField RecordingsEveryday ConversationsDialoguesRussian LanguageBackchannelingDiscursive WordsConversational Word Forms
Publication based on the results of:
Текст как Big Data: моделирование конвергентных процессов в языке и речи цифровыми методами (2023)
Similar publications
Лингвистический анализ рекламы парфюма в англоязычном и русскоязычном дискурсах
Gabrielova E., Шевякова Ю. С., Вестник Удмуртского университета 2026 Vol. 36 No. 2 P. 344–354
In today's globalized world, the effectiveness of sales and the success of products largely rely on well-crafted advertising texts. Influenced by this factor and the growing competition, advertising continuously evolves, incorporating various linguistic, psychological, and cross-cultural techniques. This study focuses on the linguistic and stylistic analysis of perfume advertising texts within English and Russian discourses, ...
Added: May 25, 2026
On the Curse Formula in Wʿzb’s Inscription (RIÉ 192 B, ll. 5–9)
Bulakh M., Aethiopica 2025 Vol. 28 P. 39–52
The article deals with the curse formula belonging to the sixth-century inscription by an Aksumite king Wʿzb (RIÉ 192 B, ll. 5–9). After summarizing the extant interpretations, the author proposes a new reading and interpretation, arguing that the text under scrutiny follows the same pattern and employs the same rhetoric devices as the curse formulas ...
Added: May 23, 2026
Practicamos el Subjuntivo
Bocharov Y., M.: -, 2025.
This textbook is designed for students improving their Spanish proficiency at levels B1-B2. It consists of five topics and a selection of texts to reinforce them. The first topic covers the morphology of the four tenses (present, perfect, imperfect, subjunctive perfect) and exercises on the formation of forms. The remaining topics are devoted to exploring ...
Added: May 23, 2026
Эстетика аудиовизуальной журналистики. Учебное пособие. 2-е издание
Novikova A., Бережная М. А., Кирия И. В., КноРус, 2026.
The aesthetics of journalism is substantiated as a necessary component in the professional training of specialists in audiovisual media. The factors and trends of historical and current changes in the aesthetics of journalism are presented, and the aesthetic practices of audiovisual journalism are characterized in terms of their social functioning. Criteria for aesthetic evaluation are ...
Added: May 22, 2026
Juxtapositional vs. possessive-like encoding in Russian specificational constructions
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
FOCUS ON VOCABULARY Экономика материальных и нематериальных активов: корпусный словарь и ИИ-упражнения по английскому языку
Gorina O. G., Kucherenko S., Larisa K. et al., St. Petersburg: Asterion, 2026.
This textbook is an integrated teaching and learning resource for English for Specific Purposes (ESP) in the field of economics of tangible and intangible assets. Its design employs (i) modern corpus linguistics methods, including frequency analysis and keyword extraction based on authentic texts reflecting current trends in professional discourse, and (ii) artificial intelligence technologies for ...
Added: May 16, 2026
КОГНИТИВНО-АССОЦИАТИВНОЕ ПОЛЕ ОНИМОВ САНКТ-ПЕТЕРБУРГА И ВЕНЫ
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 Т. 32 № 1 С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
«Плоский мир» Т. Пратчетта глазами русскоязычного фандома
Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173
Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...
Added: May 10, 2026
Научно обоснованные образовательные интервенции для развития и улучшения понимания прочитанного у подростков
Логвиненко Т. И., Стрельцова А. В., Otstavnov N. et al., Вопросы образования 2025 № 2 С. 101–141
The aim of this article is to review empirical studies, meta-analyses and systematicreviews on educational interventions for developing and improving reading compre-hension in adolescents, including both typically developing readers and those ex-periencing reading difficulties. We distinguish seven intervention types aimed at im-proving reading comprehension, each targeting different components as the basisfor intervention: decoding and reading ...
Added: December 11, 2025
От вина до самогона: топика пьянства в студенческих песнях
Воробьев В. А., В кн.: Толока: сборник статей к 60-летию А.Б. Мороза.: М.: РГГУ, 2025. С. 127–152.
The topic of drunkenness plays a significant role in student songs and is expressed through specific vocabulary,  primarily the names of alcoholic beverages. The article examines a group of over 400 occurrences in three corpora (more than 500 texts) in comparison with the social and historical-cultural context of the songs’ existence. The analysis focuses on the ...
Added: October 9, 2025
Comparative Analysis of Encoder-Based NER and Large Language Models for Skill Extraction from Russian Job Vacancies
Matkin N., Smirnov A., Usanin M. et al., , in: 12th International Conference, AIST 2024, Bishkek, Kyrgyzstan, October 17–19, 2024, Revised Selected Papers.: Cham: Springer, 2025.
The labor market is undergoing rapid changes, with increasing demands on job seekers and a surge in job openings. Identifying essential skills and competencies from job descriptions is challenging due to varying employer requirements and the omission of key skills. This study addresses these challenges by comparing traditional Named Entity Recognition (NER) methods based on ...
Added: July 26, 2025
Kak že kak že! Russian discourse formula of confirmation as a marker of recognition
Ekaterina Rakhilina, Bychkova P., , in: Constructions with lexical repetitions in East Slavic.: De Gruyter Mouton, 2024. P. 197–222.
The chapter presents a case study of the repetition mechanism within the development of discourse formulae, i.e., multi-word formulaic replies similar to yes and no. It closely examines the process of pragmaticalization in the Russian formula Kak že! (‘how part’) and its duplicated counterpart. The diachronic corpus data shows that the formula Kak že! emerged ...
Added: February 13, 2025
INCORPORATING INTERJECTIONS TO FACILITATE CONVERSATIONAL FLOW
Rodomanchenko A., , in: Teaching English in Global Contexts, Language, Learners and Learning.: Электронная публикация, 2023. P. 199–211.
Have you ever been in a situation where you lost your train of thought because of being asked a question mid-talk or were distracted by a side comment? Probably, like others, you struggled to get back on track. Although such interruptions are part of authentic conversations, they are rarely addressed in English classes. In this ...
Added: February 8, 2024
The Function of Metacommunicative Markers in Russian-Speaking Communication (a Sociolinguistic Aspect)
T.I. Popova, Communication studies 2021 Vol. 8 No. 3 P. 454–464
The article considers the use of metacommunicative pragmatic markers in the gender aspect, taking into account the social roles of the speaker. The research is carried out on the data of ORD corpus Russian Everyday Speech known as “One Speaker’s Day” corpus, based on transcripts of audio recordings obtained under actual conditions. The volume of ...
Added: October 16, 2023
Лексикология английского языка
Киселева С. В., Кононова И. В., Trofimova N., St. Petersburg: ., 2022.
This textbook is intended for students studying in the bachelor's degree program "Linguistics" and preparing for the exam in the discipline "Fundamentals of the theory of the first foreign language". The manual aims to give students an idea of the specifics of the vocabulary of the modern English language, the origin of words, the problems of the meaning of ...
Added: April 9, 2023
COVID-19 as a Linguistic Phenomenon and its Influence on the Development of Modern Regional Terminology
Pesina S., Kiseleva S., Nella A. Trofimova et al., Journal of Pharmaceutical Negative Results 2022 Vol. 13 No. S8 P. 2985–2991
The article is devoted to the study of COVID-19 as a linguistic phenomenon based on the material of the Russian and English languages, as well as the impact of the pandemic on the vocabulary of two languages. The article examines the influence of the course of the coronavirus pandemic on the meaning of neologisms of ...
Added: December 11, 2022
Pragmatic Markers of Russian Everyday Speech: Invariants in Dialogue and Monologue
Bogdanova-Beglarian N., Blinova O. V., Sherstinova T. et al., , in: Speech and Computer. 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021Vol. 12997.: St. Petersburg: Springer, 2021. P. 81–90.
The paper presents the distribution of pragmatic markers (PM) of Russian everyday speech in two types of discourse: dialogical and monologic. PMs are an essential part of any oral discourse, therefore, quantitative data on their distribution are necessary for solving both theoretical and practical tasks related to studies of speech communication, as well as for ...
Added: October 31, 2021
A Grammar of May: An Austroasiatic Language of Vietnam
Babaev K., Samarina I., Brill, 2021.
Not only is May otherwise undescribed in writing, but it is also the only small Vietic language documented and analysed in such detail, and one of few endangered Austroasiatic languages described so thoroughly.  May is predominantly monosyllabic, yet retains traces of affixes and consonant clusters that reflect older disyllabic forms. It is tonal, and also manifests ...
Added: June 24, 2021
Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts
Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69–89
The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...
Added: June 24, 2021
Pragmatic markers in the aspect of communicative alignment
Трощенкова Е. В., Blinova O. V., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2020 Vol. 19 No. 3 P. 49–58
The article presents a model of communicative alignment in pragmatic markers (PM) use in Russian everyday dialogical communication. The main objectives are to check whether speakers coordinate their linguistic behavior not just with the use of lexemes or grammar forms or constructions, but also with PMs and how this actually works. We suppose that the ...
Added: November 1, 2020
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit