Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings
This volume contains a collection of submitted papers presented at the conference,
which were thoroughly reviewed by members of the Program Committee consisting of
more than 100 top specialists, as well as an invited paper by Prof. Scharenborg. Each
paper was reviewed, single blind, by two to four committee members (three reviewers
on the average) and then discussed by the program chairs. In total, 57 papers were
selected by the Program Committee for presentation at the SPECOM Conference.
A total of 126 submissions were received and evaluated for SPECOM/ICR. The
conference sessions were thematically organized, into Audio Signal Processing,
Automatic Speech Recognition, Speaker Recognition, Computational Paralinguistics,
Speech Synthesis, Sign Language and Multimodal Processing, and Speech and
Language Resources. An increasing number of papers used deep neural network-based
approaches across these themes.
Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ORD corpus of everyday Russian, their share can reach up to 6% of the total number of words in speech of individual speakers. More than that, in some speech fragments, PMs may even exceed the share of significant units (i. e., standard words). However, despite their frequency and usualness, PMs are still poorly understood. Current NLP and discourse modeling systems lack information on PMs distribution and usage, this fact leads to noticeable shortcomings in work of these systems when they face spontaneous speech of everyday spoken discourse. In this paper we present top frequency lists of PMs for Russian dialogue and monologue spoken speech in general, and also for separate sociological groups of informants (by gender and by age). Our current list of PMs for Russian contains 450 units which are the variants of 50 main structural types. Besides, we consider the most frequent functions of the PMs in spoken Russian. The presented quantitative data may be used for improvement of NPL and discourse modeling systems.