Building a learner corpus for Russian

E. V. Rakhilina; A. S. Vyrenkova; E. Mustakimova; Smirnov I.; Ladygina A.

?

Building a learner corpus for Russian

Ch. 10. P. 1–10.

Rakhilina E. V., Vyrenkova A. S., Mustakimova E., Smirnov I., Ladygina A.

In this paper we describe an open learner corpus of Russian. The Russian Learner Corpus (RLC) is the first corpus with clear distinction between foreign language learners and heritage speakers. We discuss the structure of the corpus, its development and the annotation principles. This paper describes the platform of the RLC which combines online tools for text uploading, processing, error annotation and corpus search.

Language: English

Full text

Text on another site

Keywords: Heritage Russian learner corpora Russian as a second language second language acquisition

Publication based on the results of:

Tendencies of language change in the mirror of corpora (2016)

In book

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC

Linköping: LiU Electronic Press, 2016.

Нестандартные метафоры движения в текстах казахско-русских билингвов

Rakhilina E. V., Казкенова А., Russian Language Journal 2025 Т. 75 № 1 Статья 1492

This paper examines a specific case of fictive motion conceptualization—non-standard motion metaphors identified in texts from the Kazakh subcorpus of the Russian Learner Corpus (http://www.web-corpora.net/RLC). The texts were written by students from several universities in Almaty who are native speakers of Kazakh (L1) and second-language speakers of Russian (L2). Our study focuses on three aspects ...

Added: December 1, 2025

Direct object acquisition in the speech of adult L2 Russian learners

Kseniia K. Kashleva, Krasnoshchekova S., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Vol. 23 No. 2 P. 130–144

This article investigates direct object acquisition by adult learners of Russian as a second language (L2). Students of different proficiency levels (A1-C1) took part in the experimental study and their data was compared with that produced by the L1 speakers. It found that verb valency, as well as learner profi ciency levels, significantly impact this process. ...

Added: February 6, 2025

Distractor Generation for Lexical Questions Using Learner Corpus Data

Nikita Login, Jazykovedny Casopis 2023 Vol. 74 No. 1 P. 345–356

Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of ...

Added: September 16, 2024

L1 Influence on the Use of the English Present Perfect: A Corpus Analysis of Russian and Spanish Learners’ Essays

Perez-Guerra J., Smirnova E. A., Journal of Language and Education 2024 Vol. 10 No. 1 P. 101–114

Mastering verbal tenses, especially those expressing aspect, in a second language presents a challenge as learners frequently link the semantic nuances of verbal forms in their second language (L2) to the characteristics of the verbal systems in their native languages (L1). This study explores the impact of L1 on the usage of the English Present ...

Added: March 3, 2024

Word-formation complexity: a learner corpus-based study

Lyashevskaya O., Pyzhak J.V., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471–492

This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...

Added: October 5, 2022

Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons

M. A. Klimova, V. K. Smilga, D. A. Overnikova, , in: Труды международной конференции «Корпусная лингвистика–2021».: Скифия-принт, 2021. P. 112–121.

Added: October 31, 2021

When a cross-linguistic tendency marries incomplete acquisition: preposition drop in Russian spoken in Daghestan

Panova A., Philippova T., International Journal of Bilingualism 2021 Vol. 25 No. 3 P. 640–667

Aims and Objectives/Purpose/Research Questions: The purpose of the study is to figure out what factors condition the phenomenon of preposition drop (P-drop) in locative, directional and temporal phrases. Specifically, we investigate what kind of phrases allow P-drop in Russian spoken in Highland Daghestan and aim at understanding the rationale for this phenomenon. Design/Methodology/Approach: We conduct a quantitative analysis ...

Added: November 2, 2020

Hedges in Russian EAP writing: A corpus-based study of research papers in management

Smirnova E. A., Стринюк С. А., Journal of English as a Lingua Franca 2020 Vol. 9 No. 1 P. 81–101

The fact that English has become a lingua franca of academic communication has led to increased attention to teaching English for academic purposes (EAP) at the academia. Academic discourse markers, such as hedges, have been an important topic in academic writing research whose prime aim is helping non-Anglophone researchers to present their research findings in ...

Added: October 14, 2020

The rise of a lingua franca: The case of Russian in Dagestan

Dobrushina N., Kultepina O., International Journal of Bilingualism 2021 Vol. 25 No. 1 P. 338–358

Aims and objectives: In Dagestan, Russian is the language of education, urban way of life, and upward social mobility, and the means of communication between speakers of different languages. This is a result of a quick and drastic change. At the end of the 19th century, Russian was spoken by less than 1% of the population. ...

Added: October 14, 2020

Acquisition of aspect in L2: The computation of event completion by Japanese learners of English.

Kaku-MacDonald K., Liceras J., Kazanina N., Applied Psycholinguistics 2020 No. 41(1) P. 185–214

Previous studies on the acquisition of semantics in the aspectual domain have suggested that a difficult case for achieving a targetlike representation in a second language arises when learners need to preempt a first language (L1) option (Gabriele, 2009). This study investigates this issue by focusing on a learning scenario where predicate-level variability exists in ...

Added: September 8, 2020

Количественная шкала для оценки ошибочных произнесений инофонов как основа для планирования и корректировки вводных фонетических курсов

Blok E. E., Рема 2019 № 4 С. 34–52

Based on an empirical verification of L1-L2 contrastive analysis results, the author designed a methodology and a numeric scale for assessing consonant errors typical of Russian native speakers speaking German. The paper describes how the scale can be used for detecting and ranging main discrepancies between Russian and German consonant systems resulting from different phonological ...

Added: February 4, 2020

POS tagger evaluation for the automated text analysis and identification of learner error

Vinogradova O. I., Buzanov A., Генералова С. А. et al., , in: ПРОСТРАНСТВО НАУЧНЫХ ИНТЕРЕСОВ: ИНОСТРАННЫЕ ЯЗЫКИ И МЕЖКУЛЬТУРНАЯ КОММУНИКАЦИЯ - СОВРЕМЕННЫЕ ВЕКТОРЫ РАЗВИТИЯ И ПЕРСПЕКТИВЫВып. 3.: Буки Веди, 2019. Ch. 6 P. 44–49.

Working with learner corpora requires elaborate NLP techniques such as POS-annotation. In this article a team of computational linguists presents their experience of choosing a POS-tagger for precise and effortless annotation of .txt files with Python3. Russian Error-Annotated Learner English Corpus (REALEC) is the underlying corpora to which text features the POS-tagger has to respond. ...

Added: December 28, 2019

Automated assessment of learner text complexity

Lyashevskaya O., Irina Panteleeva, Olga Vinogradova, Assessing Writing 2021 No. 49 Article 100529

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, and over the past two decades there appeared various tools for the provision of automated instant feedback. The presented paper offers an application that focuses on measuring text complexity, ...

Added: October 20, 2019

The Routledge Handbook of Second Language Research in Classroom Learning: Processing and Processes

ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD, 2019.

The Routledge Handbook of Second Language Research in Classroom Learning is a comprehensive psycholinguistic approach to the issue of instructed language learning that is uniquely theoretical, methodological, empirical, pedagogical, and curricular. Bringing together empirical studies with theoretical underpinnings, this handbook focuses on conceptual replications/extensions of, and new research on, classroom learning or Instructed SLA (ISLA). In ...

Added: October 11, 2019

Inspector: The Tool For Automated Assessment Of Learner Text Complexity

Olga I. Vinogradova, Olga N. Lyashevskaya, Irina M. P., / NRU Higher School of Economics. Series WP BRP 55/LNG/2017. 2019. No. 79.

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, but over the past two decades there appeared various tools ensuring the provision of automated instant feedback. The presented paper offers such a tool that focuses on measuring text ...

Added: October 10, 2019

Narrative Competence of Adult L2 Russian Learners

Kashleva K., Krasnoshchekova S., Journal of Psycholinguistic Research 2019 Vol. 48 No. 3 P. 617–641

Narrative competence is an essential part of language proficiency. Research of narrative competence has both a theoretical and empirical value. Our study aims to assess narrative competence of adult L2 Russian learners and to investigate the relationship between their narrative competence and their language proficiency. For assessment, we used the Multilingual Assessment Instrument for Narratives ...

Added: January 3, 2019

Контрастивный анализ как научная база для создания компьютерных тренажеров иноязычного произношения

Blok E. E., Вестник Бурятского государственного университета 2015 № 10(1) С. 97–105

The paper argues that contrastive analysis can be used as methodological basis for developing linguistic support of pronunciation training systems aiming at improving foreign pronunciation of those learning languages. Contrastive analysis enables to detect all potential negative transfer zones, including the “concealed” ones, which can dramatically increase training system efficiency. The procedure enabling to predict ...

Added: December 18, 2018

Early and late learners decompose inflected nouns, but can they tell which ones are inflected correctly?

Gor K., Chrabaszcz A., Cook S., Journal of Second Language Studies 2018 Vol. 1 No. 1 P. 106–140

An auditory lexical decision task tests morphological decomposition and sensitivity to violations in inflection in late second language learners, early learners (heritage speakers), and native speakers of Russian. Two datasets compared reaction times and error rates to real Russian inflected nouns and nonce nouns. Two parameters of real nouns were manipulated: case (the nominative, or ...

Added: October 9, 2018

A case for agreement: Processing of case-inflected nouns by early and late learners

Gor K., Chrabaszcz A., Cook S., Linguistic Approaches to Bilingualism 2019 Vol. 9 No. 1 P. 6–41

Previous research on Russian nominal inflection reports a processing advantage for the Nominative case, the citation form, in native and highly proficient nonnative speakers (Gor, Chrabaszcz, & Cook, 2017). However, it remains unclear whether this advantage is present only in single-word presentation, or it is a fundamental property of lexical storage and access. Moreover, it ...

Added: October 9, 2018