De profundis: проблемы глубокой разметки мультимедийного русского корпуса и пути решения
The paper focuses on the manual gesture annotation in the Multimodal Russian Corpus (MURCO), which was started up by E.A. Grishina and is continued by the authors of this paper. The important idea of the annotation process is the attempt to provide “the uniformity and commonality of the markup” [Grishina 2010] to the maximum degree possible. To do so, the annotator should carefully study the MURCO data which was marked by E.A. Grishina (as sadly we have nobody to ask the questions directly) and discover the rules that govern the gesture annotation and that were probably meant by E.A. Grishina herself.
The paper describes three of such rules: 1) the choice between the gestures to open one’s eyes wide and to raise one’s brows (both meaning ‘fear’) – we state that the main factor here is the distinctness of gesture performance, 2) the choice between the meanings ‘confirmation’ and ‘emphasis’ of the gesture to close one’s eyes, and 3) the choice between the same two meanings of the gesture to nod. In both cases the meaning ‘confirmation’ is preferable if the utterance accompanied by the gesture is the answer to somebody’s remark. The other factor is the cohesion – if the utterance accompanied by the gesture conveys the same meaning as the previous utterance of the speaker, the meaning ‘confirmation’ should be preferred.
The conference was organised under the aegis of the Learner Corpus Association and was hosted by Eurac Research Institute for Applied Linguistics. It was themed "Widening the scope of learner corpus research" and brought together researchers and language teachers, software developers and linguists from 23 countries around the world.
Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents the first steps taken by Russian corpus linguistics toward the development of language corpora and corpus-based resources as well as their use in grammatical and lexical analysis.
The first part of the book focuses on the annotation of Russian texts at several levels: lemmas, part of speech and inflectional forms, word formation, lexical-semantic classes, syntactic dependencies, semantic roles, frames, and lexical constructions. We discuss various theoretical principles and practical considerations motivating the corpus markup design, provide details on the creation of lexical resources (electronic dictionaries and databases) and text processing software, and consider complicated cases that present challenges for the annotation of corpora both manually and automatically. In most cases we describe the annotation of the Russian National Corpus (RNC, ruscorpora.ru) and its affiliate project FrameBank (framebank.ru).
Frequency data depend not only on the representativeness and balance of texts in a corpus, but also on the rules and tools used for annotation. The book addresses the development of evaluation standards for Russian NLP resources, namely, morphological taggers and dependency parsers. In addition, the book presents several experiments on automatic annotation and disambiguation: lemmatization of word forms not in the dic- tionary; word sense disambiguation based on vectors formed by lexical, semantic and grammatical cues of context; and semantic role labeling.
The final chapters of the first part of the book outline two types of frequency dictionaries based on the RNC data: a general-purpose frequency dictionary and a lexico-grammatical one.
The second part of the book presents an analysis of corpus data and includes a number of case studies of Russian grammar and lexical-grammatical interaction using quantitative methods. The key concept underlying our analysis is the behavioral profile (Hanks 1996; Divjak, Gries 2006), which is the frequency distribution of variable elements in a linguistic unit as attested in a corpus. This covers grammatical profiles (the frequency distribution of inflected forms of a word), constructional profiles (the frequency distri- bution of argument or any other constructions attested for a key predicate), lexical and semantic profiles (the frequency distribution of words and lexical-semantic classes in construction slots or, more generally, in the context of a word), and radial category profiles (the frequency distribution of word senses and word uses across the radial category network of a polysemous unit). We use grammatical, constructional, semantic, and radial category profiling to study tense, aspect and mood specialization of Russian verb forms; to identify singular-oriented and plural-oriented nouns; to investigate factors for prefix choice and prefix variation in natural perfectives (chistovidovye perfectivy); to analyze constraints on the filling of slots in a construction and how this affects the meaning of the construction, taking as an example the Genitive construction of shape and the spatial construction with the preposition poverkh ‘up and over’.
The quantitative corpus-based techniques used for the analysis vary from simple descriptive statistics (e. g., absolute frequencies, percentages, measures of the central ten- dency and outliers) to exact Fisher test and logistic regression. We claim that the vector modeling approaches to quantitative grammatical studies in theoretical linguistics are no less effective than in computational linguistics, where they have become a standard tool.
Review of the book by Elena A. Grishina "Russian gestures from a linguistic perspective". Moscow, 2017. 744 p.
The present work is dedicated to the role of gestures in overcoming lexical access problems in patients with motor aphasia. The study is based on a corpus of narratives by brain-damaged individuals – «Russian CliPS» (Clinical Pear Stories), the videos from which were annotated in the linguistic annotator «ELAN», with the gestural layout included in the analysis. The results suggest that most often the difficulties with lexical access were related to the search for nouns and verbs, and gestures (deictic and rhythmic gestures, beats) facilitated lexical access in patients.
This article provides a brief overview of Daba software package created in the course of building corpora for Manding languages. Key software features are motivated by the tasks and problems characteristic of many African languages. The corpus-building model proposed here was initially developed for Bambara Reference Corpus which is available online and is freely accessible. The morphological analysis procedure and corpus annotation scheme are discussed in detail. Daba uses a morpheme-based morphological annotation scheme inspired by the interlinear glossed form of presentation of linguistic examples. A scheme mapping Daba’s morpheme-based morphological information onto traditional word-based corpus annotation is provided. Since Bambara is characterized by a low level of written language standardization special attention is paid to the issues of representing variability in corpus annotation.
The paper is focused on the study of reaction of italian literature critics on the publication of the Boris Pasternak's novel "Doctor Jivago". The analysys of the book ""Doctor Jivago", Pasternak, 1958, Italy" (published in Russian language in "Reka vremen", 2012, in Moscow) is given. The papers of italian writers, critics and historians of literature, who reacted immediately upon the publication of the novel (A. Moravia, I. Calvino, F.Fortini, C. Cassola, C. Salinari ecc.) are studied and analised.
In the article the patterns of the realization of emotional utterances in dialogic and monologic speech are described. The author pays special attention to the characteristic features of the speech of a speaker feeling psychic tension and to the compositional-pragmatic peculiarities of dialogic and monologic text.