THE ROLE AND APPLICATIONS OF EXPERT ERROR ANNOTATION IN A CORPUS OF ENGLISH LEARNER TEXTS

O. I. Vinogradova

?

THE ROLE AND APPLICATIONS OF EXPERT ERROR ANNOTATION IN A CORPUS OF ENGLISH LEARNER TEXTS

P. 830-840.

Vinogradova O. I.

The paper presents the rationale for the decisions that were taken in the set-up and further development of a learner corpus of student texts written in English by Russian learners of English, the only Russian learner corpus in the open access. The tool of manual expert annotation is in the focus of the present observations, and after introducing categorization of errors applied in annotation, the complicated cases that arose in annotation practices have been looked into followed by comparison of the annotation statistics over the three stages in the corpus development. For that purpose, texts annotated by different groups of participants in the process of two experiments were used to spot the problematic areas in annotation. The main pedagogical applications of the learner corpus in teaching EFL – the opportunities to create automated training exercises and placement and progress tests custom-made for specific groups of students - are outlined in the concluding part of the paper.

Language: English

Full text

Text on another site

Keywords: corpus research computational linguistics learner corpora error annotation

Publication based on the results of:

Лексикологические исследования на базе учебного корпуса REALEC (Learner corpus REALEC: Lexicological observations) (2016)

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)

Вып. 15. , М. : Изд-во РГГУ, 2016

Learner Corpora Researches Review (trends observed in the 8th conference CORPUS LINGUISTICS - 2015)

Vinogradova O. I., Journal of Language and Education 2015

The reviewed trends involve primarily the use of learner corpora in teaching and learning foreign languages, and for many authors it implies the context of EFL but for learners with different L1. The researches under investigation fall into four main types - use of learner corpora incorporated into teaching methodology, use of academic learner corpora in the ...

Added: October 12, 2015

Semi-automated typical error annotation for learner English essays: Integrating frameworks

Kutuzov A. B., Kuzmenko E., , in : Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015, Vilnius, 11th May, 2015. Issue 114.: Linköping University Electronic Press, 2015. P. 35-41.

This paper proposes integration of three open source utilities: brat web annotation tool, Freeling suite of linguistic analyzers and Aspell spellchecker. We demonstrate how their combination can be used to pre-annotate texts in a learner corpus of English essays with potential errors and ease human annotators’ work. Spellchecker alerts and morphological analyzer tagging probabilities are ...

Added: May 31, 2015

4th Learner Corpus Conference. LCR 2017. Book of Abstracts

Bozen : [б.и.], 2017

The conference was organised under the aegis of the Learner Corpus Association and was hosted by Eurac Research Institute for Applied Linguistics. It was themed "Widening the scope of learner corpus research" and brought together researchers and language teachers, software developers and linguists from 23 countries around the world. ...

Added: November 7, 2017

Approaches to automated English essay evaluation in Russian students’ learner corpus

Lyashevskaya O., Olga Vinogradova, , in : 4th Learner Corpus Conference. LCR 2017. Book of Abstracts. : Bozen : [б.и.], 2017. P. 200-202.

REALEC (Vinogradova, 2016) is the first in the open access collection of English texts (mainly essays) written by students with Russian as their native language who are learning English at the university. The project team working with the corpus over the last two years have been developing computational tools to make the use of REALEC ...

Added: November 8, 2017

Punctuation in L2 English: Computational Methods Applied in the Study of L1 Interference

Vinogradova O. I., Viklova A., Smilga V., , in : Emerging Writing Research from the Russian Federation. : WAC Clearinghouse, University Press, Colorado, 2021. Ch. 9. P. 211-233.

Added: February 4, 2020

MULTI-LEVEL STUDENT ESSAY FEEDBACK IN A LEARNER CORPUS

Vinogradova O. I., Lyashevskaya O., Irina Panteleeva, , in : Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" Proceedings. Vol. 1. Issue 16 (23).: M. : -, 2017. P. 373-386.

The paper presents the results of using some computer tools and applications for the purposes of the automated and semi-automated syntactical, lexica, and error analysis of student essays in a learner corpus. The texts in the corpus were written in English by Russian learners of English. The experiment in the research consisted in comparing the ...

Added: May 30, 2017

Regular polysemy: from sense vectors to sense patterns

Lopukhina A., Лопухин К. А., , in : The 26th International Conference on Computational Linguistics (COLING 2016). : [б.и.], 2016. P. 19-23.

Regular polysemy was extensively investigated in lexical semantics, but this phenomenon has been very little studied in distributional semantics. We propose a model for regular polysemy detection that is based on sense vectors and allows to work directly with senses in semantic vector space. Our method is able to detect polysemous words that have the ...

Added: December 1, 2016

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)

М. : Изд-во РГГУ, 2020

Papers from the Annual International Conference “Dialogue” (2020). Issue 19 ...

Added: June 26, 2020

An Experimental Study of Hybrid Machine Learning Models for Extracting Named Entities

Lei J., Bolshakova E. I., , in : Proceedings of Third Workshop "Computational linguistics and language science". Issue 4.: Manchester : EasyChair, 2019. P. 50-60.

The paper describes two hybrid neural network models for named entity recognition (NER) in texts, namely Bi-LSTM-CRF and Gated-CNN-CRF, as well as results of experiments with them. ...

Added: November 3, 2019

L1 Influence on the Use of the English Present Perfect: A Corpus Analysis of Russian and Spanish Learners’ Essays

Perez-Guerra J., Smirnova E. A., Journal of Language and Education 2024 Vol. 10 No. 1 P. 101-114

Mastering verbal tenses, especially those expressing aspect, in a second language presents a challenge as learners frequently link the semantic nuances of verbal forms in their second language (L2) to the characteristics of the verbal systems in their native languages (L1). This study explores the impact of L1 on the usage of the English Present ...

Added: March 3, 2024

Corpus Linguistics 2015: Abstract Book

Lancaster : Lancaster University Press, 2015

The main trends and achievements in corpus linguistics are presented in this collection os abstracts of plenaries, papers and posters presented at the 8th internation conference Corpus Linguistics - 2015 (Lancaster University, UCREL, July 2015) ...

Added: October 17, 2015

Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2022)

Springer, 2024

Presents select papers from the International Conference on Internet and Modern Society (IMS 2022) Examines Smart Cities, Digital Sustainability, Digital Divides, and Social Media Movements Discusses cutting edge work on Digital Urbanism and Cyber Psychology ...

Added: December 10, 2022

Data Analytics and Management in Data Intensive Domains. 23rd International Conference, DAMDID/RCDL 2021, Moscow, Russia, October 26–29, 2021, Revised Selected Papers

Springer, 2022

“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...

Added: August 30, 2021

Digital Transformation and Global Society. Third International Conference, DTGS 2018, St. Petersburg, Russia, 2018, Revised Selected Papers. Part II. Communications in Computer and Information Science 859

Springer, 2018

This two volume set (CCIS 858 and CCIS 859) constitutes the refereed proceedings of the Third International Conference on Digital Transformation and Global Society, DTGS 2018, held in St. Petersburg, Russia, in May/June 2018. The 75 revised full papers and the one short paper presented in the two volumes were carefully reviewed and selected from 222 submissions. ...

Added: November 15, 2018

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М. : Издательский центр «Российский государственный гуманитарный университет», 2019

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

The 26th International Conference on Computational Linguistics (COLING 2016)

[б.и.], 2016

Added: December 1, 2016

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.). Дополнительный том материалов

M. : ., 2020

Дополнительный том включает доклады Международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2020», не вошедших в основной сборник. Доклады представляют широкий спектр теоретических и прикладных исследований в области описания естественного языка, моделирования языковых процессов, создания практически применимых компьютерных лингвистических технологий. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: July 3, 2020

Corpus of Russian student texts: design and prospects

Zevakhina N., Dzhakupova S., , in : Материалы 21-й Международной конференции по компьютерной лингвистике "Диалог". : М. : Изд-во РГГУ, 2015.

The Corpus of Russian Student Texts (CoRST) is a computational and research project started in 2013 at the Linguistic Laboratory for Corpora Research Technologies at HSE. It comprises a collection of Russian texts written by students from various Russian universities. Its main research goal is to examine language deviations viewed as markers of language change. ...

Added: May 20, 2015

The bimodal corpus of Russian-Turkic bilinguals' speech (RuTuBic)

Artemenko E., Резанова З. И., Темникова И. Г. et al., Компьютерная лингвистика и интеллектуальные технологии 2019 Vol. Suppl No. 18 P. 200-210

The paper presents Russian-Turkic Bilingual Corpus (RuTuBiC) design, its basic identifying features: the aim of producing a corpus, the types of texts it contains, metatextual markup and error annotation principles, technological (IT, digital) concepts. The current state and development trends of the corpus are discussed. The corpus started as an integral part of a research project ...

Added: May 4, 2022

Corpus Methods in Pragmatics: The Case of English and Russian Emotions

Apresyan V., Intercultural Pragmatics 2013 Vol. 10 No. 4 P. 533-568

The present paper is a comparative corpus study of the verbal expression of emotional etiquette in American English and Russian. The study is conducted against the backdrop of certain assumptions regarding the cross-cultural centrality and marginality of emotions as formulated in the current research on cross-cultural pragmatics. The paper employs corpus-based methods to test the ...

Added: October 13, 2013

Word-formation complexity: a learner corpus-based study

Lyashevskaya O., Pyzhak J., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471-492

This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...

Added: October 5, 2022

Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons

M. A. Klimova, V. K. Smilga, D. A. Overnikova, , in : Труды международной конференции «Корпусная лингвистика–2021». : Скифия-принт, 2021. P. 112-121.

Added: October 31, 2021

Proceedings of the 8th Conference on Artificial Intelligence and Natural Language, AINL 2019. CCIS

Springer, 2019

Added: November 3, 2019

USE OF LEARNER CORPUS IN GENERAL ENGLISH AND ACADEMIC ENGLISH COURSES AT THE HIGHER SCHOOL OF ECONOMICS

Vinogradova O. I., , in : Conference Proceedings. The Future of Education International Conference The Future of Education, 6th edition. : Padova : libreriauniversitaria, 2016. P. 310-314.

There have been many reports on advances in the development of learner corpora that have made it possible to effectively use these collections of texts for the benefit of the learning process. This paper lists all possible applications in English courses taught to Bachelor students of a middle-size learner corpus REALEC, which comprises student written ...

Added: March 1, 2017