Эрратологическая разметка корпуса русских учебных текстов: тактические решения

Created by the School of Linguistics of the Faculty of Humanities at the National
Research University Higher School of Economics, the Corpus of Russian Student Texts
(CoRST) includes texts belonging to such genres as answers to various questions, argumentative
statements, essays, course papers etc., which were written either spontaneously
(in the classroom) or as prepared texts (at home) by students in Bachelor's degree
In the process of studying academic writing, students pass through different stages of
understanding how to structure academic texts.
At each stage, the interference of different styles and genres, the heterogeneous nature
of received speech patterns as well as low levels of self-correction lead to inevitable
systemic errors in grammar and grammatical stylistics, semantics and text pragmatics.
The deviations from standard speech reflect both the stadial nature of academic writing
skills and the processes characteristic of speech system dynamics in general; the formation
of new customary (usual) norms on the remains of obsolete (conservative) norms
demonstrates the limits of variability in the usage of words and word forms.
These deviations are marked by a system of tags developed and optimized by the Corpus
team (N. A. Zevakhina, S. S. Dzhakupova, Yu. M. Kuvshinskaya, S. Yu. Puzhaeva,
with active assistance from colleagues and students).
The error markup contains lexical, morphological, and discursive information.
The grammatical section shows the frequency of deviations from morphological and
syntactical patternsare connected with the slackening of a number of constructions.
For example, there are such challenges as the broadening of a number of ‘light’ verbs
(units devoid of semantic value and satisfying the syntactic needs of a statement, whose lexical meaning is delegated to a governed word); the choice of case for governed nouns;
comparative and intensifying constructions; and anaphoric usage.
The article considers specific examples marked with the tag “agreement error” (agr).
The motivation for markup when choosing a marker for a speech fragment is discussed.