Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning
The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.