Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning

A. B. Kutuzov; E. Kuzmenko

?

Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning

P. 87–97.

Kutuzov A. B., Kuzmenko E.

The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.

Language: English

Full text

Text on another site

Keywords: learner corpora computer-assisted language learning English as a second language

Publication based on the results of:

Corpus technologies in linguistic and interdisciplinary research (2014)

In book

Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping: Linköping University Electronic Press, 2014.

Distractor Generation for Lexical Questions Using Learner Corpus Data

Nikita Login, Jazykovedny Casopis 2023 Vol. 74 No. 1 P. 345–356

Learner corpora with error annotation can serve as a source of data for automated question generation (QG) for language testing. In case of multiple choice gapfill lexical questions, this process involves two steps. The first step is to extract sentences with lexical corrections from the learner corpus. The second step, which is the focus of ...

Added: September 16, 2024

L1 Influence on the Use of the English Present Perfect: A Corpus Analysis of Russian and Spanish Learners’ Essays

Perez-Guerra J., Smirnova E. A., Journal of Language and Education 2024 Vol. 10 No. 1 P. 101–114

Mastering verbal tenses, especially those expressing aspect, in a second language presents a challenge as learners frequently link the semantic nuances of verbal forms in their second language (L2) to the characteristics of the verbal systems in their native languages (L1). This study explores the impact of L1 on the usage of the English Present ...

Added: March 3, 2024

Word-formation complexity: a learner corpus-based study

Lyashevskaya O., Pyzhak J.V., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471–492

This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...

Added: October 5, 2022

Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons

M. A. Klimova, V. K. Smilga, D. A. Overnikova, , in: Труды международной конференции «Корпусная лингвистика–2021».: Скифия-принт, 2021. P. 112–121.

Added: October 31, 2021

Hedges in Russian EAP writing: A corpus-based study of research papers in management

Smirnova E. A., Стринюк С. А., Journal of English as a Lingua Franca 2020 Vol. 9 No. 1 P. 81–101

The fact that English has become a lingua franca of academic communication has led to increased attention to teaching English for academic purposes (EAP) at the academia. Academic discourse markers, such as hedges, have been an important topic in academic writing research whose prime aim is helping non-Anglophone researchers to present their research findings in ...

Added: October 14, 2020

POS tagger evaluation for the automated text analysis and identification of learner error

Vinogradova O. I., Buzanov A., Генералова С. А. et al., , in: ПРОСТРАНСТВО НАУЧНЫХ ИНТЕРЕСОВ: ИНОСТРАННЫЕ ЯЗЫКИ И МЕЖКУЛЬТУРНАЯ КОММУНИКАЦИЯ - СОВРЕМЕННЫЕ ВЕКТОРЫ РАЗВИТИЯ И ПЕРСПЕКТИВЫВып. 3.: Буки Веди, 2019. Ch. 6 P. 44–49.

Working with learner corpora requires elaborate NLP techniques such as POS-annotation. In this article a team of computational linguists presents their experience of choosing a POS-tagger for precise and effortless annotation of .txt files with Python3. Russian Error-Annotated Learner English Corpus (REALEC) is the underlying corpora to which text features the POS-tagger has to respond. ...

Added: December 28, 2019

Automated assessment of learner text complexity

Lyashevskaya O., Irina Panteleeva, Olga Vinogradova, Assessing Writing 2021 No. 49 Article 100529

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, and over the past two decades there appeared various tools for the provision of automated instant feedback. The presented paper offers an application that focuses on measuring text complexity, ...

Added: October 20, 2019

Inspector: The Tool For Automated Assessment Of Learner Text Complexity

Olga I. Vinogradova, Olga N. Lyashevskaya, Irina M. P., / NRU Higher School of Economics. Series WP BRP 55/LNG/2017. 2019. No. 79.

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, but over the past two decades there appeared various tools ensuring the provision of automated instant feedback. The presented paper offers such a tool that focuses on measuring text ...

Added: October 10, 2019

THE DESIGN OF TESTS WITH MULTIPLE CHOICE QUESTIONS AUTOMATICALLY GENERATED FROM ESSAYS IN A LEARNER CORPUS

Vinogradova O. I., Login Nikita Vjacheslavovich, / NRU HSE. Series WP BRP "Linguistics". 2017. No. 60.

Learner corpora have great potential as sources of educational material. If a corpus contains annotations of mistakes in student works, it can be of use for the recognition and analysis of the most common error patterns. The error-annotation system of the learner corpus REALEC makes it possible to automatically generate different types of test questions ...

Added: December 13, 2017

International Conference on MOOCs, language learning and mobility 13 – 14 October 2017, Naples; Italy

[б.и.], 2017.

International Conference on MOOCs, language learning and mobility 13 – 14 October 2017, Naples; Italy ...

Added: December 5, 2017

To automated generation of test questions on the basis of error annotations in EFL essays: a time-saving tool?

Olga Vinogradova, , in: Learner Corpora and Language TeachingVol. 92.: John Benjamins Publishing Company, 2019. Ch. 1-2 P. 29–48.

The paper introduces a valuable tool for EFL instructors to select the direction for creating custom-made learning materials, namely, using a learner corpus with errors annotated by experts for the purpose of administering to the target group of learners a custom-made test which has been automatically generated from the sentences with student errors. The paper describes the ...

Added: November 8, 2017

АВТОМАТИЗИРОВАННАЯ ОЦЕНКА ЛЕКСИКОНА ОБУЧАЮЩИХСЯ ПРИ ПОМОЩИ УЧЕБНОГО КОРПУСА

Vinogradova O. I., ПОЛИЛИНГВИАЛЬНОСТЬ И ТРАНСКУЛЬТУРНЫЕ ПРАКТИКИ 2018 Vol. 15 No. 2018/3 P. 372–380

The role of access to a learner corpus has proved to increase efficiency of L2 acquisition for learners as well as teaching efficiency for EFL instructors. This paper presents a computer tool for a learner corpus designed at the School of Linguistics of the Higher School of Economics for both categories of users. REALEC, Russian ...

Added: November 8, 2017

Approaches to automated English essay evaluation in Russian students’ learner corpus

Lyashevskaya O., Olga Vinogradova, , in: 4th Learner Corpus Conference. LCR 2017. Book of Abstracts.: Bozen: [б.и.], 2017. P. 200–202.

REALEC (Vinogradova, 2016) is the first in the open access collection of English texts (mainly essays) written by students with Russian as their native language who are learning English at the university. The project team working with the corpus over the last two years have been developing computational tools to make the use of REALEC ...

Added: November 8, 2017

4th Learner Corpus Conference. LCR 2017. Book of Abstracts

Bozen: [б.и.], 2017.

The conference was organised under the aegis of the Learner Corpus Association and was hosted by Eurac Research Institute for Applied Linguistics. It was themed "Widening the scope of learner corpus research" and brought together researchers and language teachers, software developers and linguists from 23 countries around the world. ...

Added: November 7, 2017

Building a learner corpus for Russian

Rakhilina E. V., Vyrenkova A. S., Mustakimova E. et al., , in: Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC.: Linköping: LiU Electronic Press, 2016. Ch. 10 P. 1–10.

In this paper we describe an open learner corpus of Russian. The Russian Learner Corpus (RLC) is the first corpus with clear distinction between foreign language learners and heritage speakers. We discuss the structure of the corpus, its development and the annotation principles. This paper describes the platform of the RLC which combines online tools ...

Added: November 5, 2017

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC

Linköping: LiU Electronic Press, 2016.

The joint workshop on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) & NLP for Language Acquisition (LA) – shorthand NLP4CALL&LA – is an effort to provide a debate space and collaboration between two closely related areas. Both focus on language acquisition, related resources and technologies, that can support research of the language learning ...

Added: November 5, 2017

Использование русского учебного корпуса в преподавании РКИ: вид глагола

Olshevskaya M., Международный аспирантский вестник. Русский язык за рубежом 2018 № 1 С. 13–18

On the body material in the article, common errors in the use and construction of the verb form are considered - from the theoretical and typological points of view. The data of the RLC educational building containing texts of students of the Russian language as a foreign language are used. Identified "weaknesses" in the assimilation ...

Added: October 19, 2017

Профессиональный иностранный язык для студентов физико-математического факультета: учебное пособие

Kashleva K., М.: Московский государственный областной университет, 2017.

This textbook is intended to be used by students of departments of physics and mathemathics. Its aim is to form language skills that are required for professional communication. The textbook can be useful for anyone who is interested in learning English for specific purposes. ...

Added: October 17, 2017

ICT applications and distance courses for Russian language teaching and learning

Romanov Y., Romanova I., , in: Information Innovative Technologies: Materials of the International scientific–рractical conference.: M.: Association of graduates and employees of AFEA named after prof. Zhukovsky, 2017. P. 70–73.

The paper describes most effective approaches of computer-assisted language learning (CALL), such as ICT applications and Web-based distance learning. When introduced into practical teaching of the Russian language to international students, they considerably enhance the quality of teaching and learning process. ...

Added: July 31, 2017

eLearning in the English language training: new opportunities and challenges for Russian Universities

I.A. Malinina, Tsvetkova S. E., KAFU Academic Journal 2016 No. 8 P. 138–146

The purpose of the paper is to give an overview of the Internet resources, tools and technologies that can be used in different types of elearning of the English language.The paper also highlights the problems that are likely to occur when employing the technologies: low level of information culture, technological hurdles, psychological readiness, motivation. The ...

Added: January 16, 2017

A Searching Tool for Russian Error-Annotated Learner English Corpus

Fenogenova A., Kuzmenko E., / NRU HSE. Series WP BRP "Linguistics". 2016.

Learner corpora constitute an effective resource for specialists in fields of second language acquisition, foreign language teaching and corpus linguistics. They tend to get significant scholarly help from statistical tools of various kinds. However, for valuable usage of a corpus it should provide convenient and powerful tools for searching and manipulating data. In this paper ...

Added: December 14, 2016