Russian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning
The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.
This textbook is intended to be used by students of departments of physics and mathemathics. Its aim is to form language skills that are required for professional communication. The textbook can be useful for anyone who is interested in learning English for specific purposes.
This article presents an approach to the automatic generation of open cloze exercises based on arbitrary English text. The exercise format is similar to the open cloze test used in Cambridge English certificate exams (FCE, CAE, CPE). The presented method also makes it possible to adjust the difficulty of the resulting exercises to better suit specific proficiency levels. Three experiments were conducted to evaluate the usefulness of the machine-generated exercises, compare them with authentic Cambridge English tests and study the difficulty-setting capabilities. The experiments showed that the generation method used was quite effective. With some customization, the method can be applied to generating similar exercises for other languages.
Language exercises are widely used in teaching foreign languages; yet, manually creating exercises is labor-intensive and time-consuming. This paper describes a method for automatically generating EFL wordbank cloze exercises. These are generated from arbitrary passages in English, which is an important advantage in terms of learner motivation; indeed, the content of the exercises can be tailored to learners’ interests. Another feature of the method is exercise difficulty adjustment. Unlike other systems, our algorithm does not rely on many external linguistic resources and can be thus more easily adapted to other languages. Two experiments were conducted to evaluate the proposed method. The experiments showed that our algorithm performs significantly better than the ‘naïve’ random-sample baseline and that its precision of making gaps is 97%.
The project we present – Russian Learner Translator Corpus (RusLTC) is a multiple learner translator corpus which stores Russian students’ translations out of English and into it. The project is being developed by a cross-functional team of translator trainers and computational linguists in Russia. Translations are collected from several Russian universities; all translations are made as part of routine and exam assignments or as submissions for translation contests by students majoring in translation. As of March 2014 RusLTC contains the total of nearly 1.2 million word tokens, 258 source texts, and 1,795 translations. The paper gives a brief overview of the related research, describes the corpus structure and corpus-building technologies used; it also covers the query tool features and our error annotation solutions. In the final part we make a summary of the RusLTC-based research, its current practical applications and suggest research prospects and possibilities.
The purpose of the paper is to give an overview of the Internet resources, tools and technologies that can be used in different types of elearning of the English language.The paper also highlights the problems that are likely to occur when employing the technologies: low level of information culture, technological hurdles, psychological readiness, motivation. The conducted research shows the growing interest to using ICT and increasing satisfaction from the learning process based on the Internet resources among students.
The workshop series on NLP for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The papers in the proceedings volume from the third NLP4CALL workshop cover three main topic areas: resources for development of ICALL applications (e.g., learner corpora and coursebook corpora), tools and algorithms for the analysis of learner language (e.g., focusing on collocations, reading tasks, cloze items, pronunciation, spelling, level classification of learner production), and the generation of learning materials (e.g., exercise generators).
The paper discusses case (non-)coincidence in elliptical coordinated constructions, which is one of the most wide-spread type of errors that Russian native speaker make.
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) – NLP4CALL – is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection.
This article presents an approach to the automatic generation of open cloze exercises that are based on real-life English texts. The exercise format is similar to the open cloze test used in Cambridge certificate exams (FCE, CAE, CPE). Two experiments were conducted to evaluate the usefulness on the machine-generated exercises and compare them with authentic Cambridge tests. The experiments showed that the generation method used was quite effective. With some customization, the presented method can be applied to generating similar exercises based on texts written in other languages.