THE DESIGN OF TESTS WITH MULTIPLE CHOICE QUESTIONS AUTOMATICALLY GENERATED FROM ESSAYS IN A LEARNER CORPUS
The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features offered to users. We describe our perspective on the corpus, data sources and tools used in compiling it. Elaborate self-made classification of learners’ errors types is thoroughly described. The paper also presents a pilot experiment on creating test sets for particular learners’ problems using corpus data.
This article presents an approach to the automatic generation of open cloze exercises based on arbitrary English text. The exercise format is similar to the open cloze test used in Cambridge English certificate exams (FCE, CAE, CPE). The presented method also makes it possible to adjust the difficulty of the resulting exercises to better suit specific proficiency levels. Three experiments were conducted to evaluate the usefulness of the machine-generated exercises, compare them with authentic Cambridge English tests and study the difficulty-setting capabilities. The experiments showed that the generation method used was quite effective. With some customization, the method can be applied to generating similar exercises for other languages.
Language exercises are widely used in teaching foreign languages; yet, manually creating exercises is labor-intensive and time-consuming. This paper describes a method for automatically generating EFL wordbank cloze exercises. These are generated from arbitrary passages in English, which is an important advantage in terms of learner motivation; indeed, the content of the exercises can be tailored to learners’ interests. Another feature of the method is exercise difficulty adjustment. Unlike other systems, our algorithm does not rely on many external linguistic resources and can be thus more easily adapted to other languages. Two experiments were conducted to evaluate the proposed method. The experiments showed that our algorithm performs significantly better than the ‘naïve’ random-sample baseline and that its precision of making gaps is 97%.
This paper focuses on referential coherence which is seen as a crucial attribute of effective academic writing. I report findings from a corpus study of Russian students' use of anaphoric expressions in their research proposals which is compared to a reference corpus comprising research articles published in peer-reviewed journals. I hypothesise that learners use anaphora less frequently than professional writers. The results of the analysis confirmed the hypothesis and allowed me to identify particular problems connected with the students' use of anaphoric expressions. It is hoped that the reported findings will challenge EAP teachers and textbook writers to consider paying closer attention to the markers of referential coherence in a course of academic writing for L2 students.
The workshop series on NLP for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The papers in the proceedings volume from the third NLP4CALL workshop cover three main topic areas: resources for development of ICALL applications (e.g., learner corpora and coursebook corpora), tools and algorithms for the analysis of learner language (e.g., focusing on collocations, reading tasks, cloze items, pronunciation, spelling, level classification of learner production), and the generation of learning materials (e.g., exercise generators).
Various issues relating to the questions of learner corpus researches and their use in teaching are presented. These include the issue of a norm in corpora whether the norm should necessarily be native and what problems a native norm may present. Learners who behave differently from native speakers do not necessarily use language incorrectly as an alternative to a unique, native norm, a range of norms are available Some of these norms may be problematic if they are not selected carefully (depending on the learner corpus, the purpose of the comparison, etc.) and handled cautiously. Different choices of norms may produce different results and thus lead to different conclusions with respect to learners’ usages. Pedagogical implications of such choices are to be examined, with particular emphasis on whether all differences between the learner corpus and the reference corpus should be targeted for teaching intervention. Problems in evaluating agreement in approaches to annotation practices are considered as well.
The paper examines construction blending as an important cause of errors in written students’ texts. The study is conducted within the framework of Construction Grammar [Fillmore and Kay 1992; Goldberg 1995, 2006] and grammar of errors [Vyrenkova et al. 2014]. It is based on the data of the Corpus of Russian Student Texts supplied with metatextual, morphological and error annotation.
International Conference on MOOCs, language learning and mobility 13 – 14 October 2017, Naples; Italy
The article examines the main trends in the study of the Stalinist period and the phenomenon of Stalinism in connection with the mass opening of the archives.