?
Speech and Computer: 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25–28, 2024, Proceedings, Part I
The article is dedicated to the results of a research project describing the classes and functioning of multiword units in contemporary Russian everyday speech. The concept of multiword units encompasses quite diverse linguistic phenomena, making the creation of a working typology one of the project's central tasks. This typology is necessary for annotating corpus material and obtaining statistical characteristics. The identified classes of multiword units include the following units: 1) non-phraseologized collocations, 2) phraseologized collocations, 3) occasional collocations, 4) idiom forms, 5) constructions, 6) precedent texts and their elements, 7) multi-word pragmatic markers, and 8) speech formulas. The article describes the methods for annotating these units using the ORD corpus of everyday spoken Russian and presents the results of a quantitative analysis of their functioning within the annotated subcorpus. The obtained data can be used to address both theoretical tasks in the field of lexical and grammatical description of Russian everyday speech and numerous tasks related to processing or generating live spoken Russian