Correcting collocation errors in learners’ writing based on probability of syntactic links

E. I. Bolshakova; Azimov A.

?

Correcting collocation errors in learners’ writing based on probability of syntactic links

P. 58–68.

The paper describes a novel method for automatic collocation error correction in NL texts that are written by language learners or are translated from another NL with the aid of machine translators. We assume that the main reason of collocation errors is the strategy of word-by-word translation used by authors of the texts or the machine translators, so the errors essentially depends on the source language. While processing a sentence from the text, the method considers as potential correcting variants all its paraphrases that have the same syntactic structure and are built by replacing of all sentence words by their substitute words. Substitute words are automatically generated using word translation equivalents taken from a particular translation dictionary. For detecting an error in the sentence, we propose a relevance degree function computed from the probability of word syntactic links and applied to the sentence and its paraphrases. If the value of the function for the sentence is less than for some of its paraphrases, our method signals an error, then it is corrected by appropriate sentence paraphrase. The method was approved for correction collocation errors in English texts written by Russian speakers. Stanford Parser and English text collection were used to gather statistics and compute probability of English word syntactic links. Within certain limitation, the experiments gave promising results: there were detected about 80% of collocation errors (with words of various POS) and 87% of proposed correcting paraphrases included “gold” correction.

Language: English

Keywords: collocations lexical combinability collocation error error correction ESL writing

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.

Т. 1: Основная программа конференции. Вып. 12 (19). , М.: РГГУ, 2013.

Sampling Rate Optimization for LDPC-Based Information Reconciliation Protocol in QKD

Morozov V., Oleg Evsutin, Nikita Yarygin, , in: 2025 XIХ International Symposium on Problems of Redundancy in Information and Control Systems (Redundancy), 5-7 Nov. 2025.: IEEE, 2025. P. 1–7.

Quantum Key Distribution (QKD) is a promising field in modern cryptography where the security of key information is guaranteed by the laws of quantum mechanics. One of the key stages in QKD protocols is error estimation and reconciliation in the secret key. This procedure requires the transmission of a certain number of secret key bits ...

Added: December 30, 2025

Медиаконцепт «вакцинация» в дискурсе немецких СМИ во время пандемии COVID-19

Balakina Y. V., Вестник Томского государственного университета 2024 № 509 С. 23–34

The relevance of the research is justified by the influence of the media on the consciousness and behavior of people during the crisis, allowing to form discursive phenomena that have specific characteristics. In addition, it seems particularly relevant to use linguistic tools to describe media and political phenomena, as well as to apply media and ...

Added: December 12, 2024

Exploring collocational complexity in L2 Russian: A corpus-driven contrastive analysis

Kopotev M., Klimov A., Kisselev O., International Journal of Bilingualism 2025 Vol. 29 No. 2 P. 439–455

Objective: The objective of this article is to discuss the pedagogical and practical need for automated assessment tools that enable teachers, researchers, and other language practitioners to relatively quickly and automatically assess the general proficiency of second language (L2) speakers according to a number of different linguistic parameters, specifically the use of collocations. Introduction: The Introduction discusses existing ...

Added: September 9, 2024

Английские прилагательные со значением размера: Когнитивные модели формирования словосочетаний

Antonova M., Вестник Томского государственного университета 2023 № 488 С. 91–100

The article analyzes from the cognitive point of view the linguistic system factors that cause differences in lexical combinability of English parametric adjectives ample, extended, expanded, wide and broad. It is hypothesized that the combinability of these adjectives is conditioned by deep cognitive models underlying their semantics. It is shown that these models are inherited ...

Added: September 22, 2023

Семантическое наполнение понятия «популизм» в английском языке (опыт лексикографического и корпусного анализа)

Gritsenko E., Галочкин А. Е., Вопросы лексикографии 2023 № 27 С. 29–46

The aim of the article is to reveal the semantic content of the concept “populism” in modern English. The need to address this topic is driven by the fact that a significant part of the research is dedicated to the analysis of specific forms of populism or populist parties in the aspect of political science, discourse theory, political rhetoric, ...

Added: May 6, 2023

Плеонастические причастия в современной русской речи: функции и тенденции развития

Ю. М. Кувшинская, Н. А. Зевахина, Acta Linguistica Petropolitana. Труды института лингвистических исследований 2023 Т. 19 № 1 С. 138–192

The paper studies tendencies in the use of full single (i.e. without their arguments) redundant participles in the attributive position in the Russian written discourse. Relying upon the data of the Russian National Corpus and the Corpus of Russian Student Texts, as well as a number of the examples collected from various written sources, the ...

Added: December 8, 2022

Terminology of Migration Studies: A Corpus Analysis of Research Papers in Social Sciences

Elizaveta Smirnova, Tatiana Permyakova, Migration Letters 2022 Vol. 19 No. 4 P. 401–412

Migration studies is a new, rapidly developing research area whose terminology is being established at the intersection of various social sciences. This article undertakes a quantitative and qualitative analysis of terms associated with migration, conducted on a 281,000-word corpus of research articles in social sciences, published in leading academic journals. Our analysis involves corpus processing ...

Added: August 1, 2022

Когнитивная обработка биномиалов русского языка тюркско-русскими билингвами

Буб А. С., Artemenko E., Язык и культура 2019 № 48 С. 32–45

The article concerns one of the aspects of bilingualism, namely the study of cognitive processing of lexical units in bilinguals. As a review of the scientific literature shows, the bilingual mental lexicon differs from the monolingual mental lexicon. In the latter, words do not exist separately, but together with colocational links, i.e. in conjunction with ...

Added: October 29, 2021

Extraction of Typical Client Requests from Bank Chat Logs

Pronoza E., Pronoza A., Yagunova E., , in: Advances in Computational Intelligence (17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Guadalajara, Mexico, October 22–27, 2018, Proceedings, Part II)* 2. Vol. 11289.: Springer, 2018. P. 156–164.

In this paper we propose a simple but powerful method of extracting key client requests from bank chat logs. Many companies nowadays are interested in building a chat bot to optimize their business, and are ready to provide chat bot developers with large amounts of data, but such data often need special preparation to be ...

Added: October 30, 2020

In Search of Lost Collocations: Combining Measures to Reach the Top Range

Khohlova M., Klyshinskiy E., , in: Internet and Modern Society: Proceedings of the International Conference IMS-2017.: NY: ACM Press, 2017. P. 160–163.

The paper discusses statistical methods for collocation extraction. We test the following hypothesis: combining several methods gives a better result than applying just one. At the first stage we suggest two methods to combine MI and t-score rankings and evaluate the results on attributive and verbal collocations against the data attested in the dictionary. At the second stage, we use regression ...

Added: October 28, 2020

Collocations and near-native competence: Lexical strategies of heritage speakers of Russian

Kopotev M., Polinsky M., Kisselev O., International Journal of Bilingualism 2020 P. 1–28

This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian heritage speakers from three different dominant-language backgrounds, namely German, Finnish, and American English. The elicited narratives are based on ...

Added: September 30, 2020

О чувстве уважения в русском языковом сознании: уважения достойно…

Botchkarev A., Slavica Slovaca 2020 Т. 55 № 1 С. 46–52

The article explores the ways of displaying uvazheniye ‘respect’ in the Russian language consciousness. The National Russian Corpus is more appropriate for this purpose, because a conceptual configuration of an analyzed concept is not present in a “finished” form in any single utterance, but may be reconstructed on the totality of all possible utterances. According ...

Added: June 24, 2020

A Linguist’s ‘Platypus Moment’: Proper Nouns as Means of Conceptualizing Events in Contemporary American English

Nagornaya A., The Humanities and Social Sciences Review 2019 Vol. 9 No. 1 P. 23–33

The paper deals with the collocations of the Proper Noun + ‘moment’ type which are becoming increasingly popular in American English as means of conceptualizing events and experiences. The paper considers the semantic properties of the noun moment which account for its use in the constructions under study. It further describes the conceptual mechanisms that ...

Added: January 29, 2020

Коррекция погрешностей волоконно-оптических измерительных преобразователей

Yurin A., Krasivskaya M., Чукарин М. И. et al., В кн.: Инновационные, информационные и коммуникационные технологии: сборник трудов XVI Международной научно-практической конференции.: Ассоциация выпускников и сотрудников ВВИА им. проф. Жуковского, 2019. С. 367–370.

The paper describes the principle of operation of a reflectometer-type fiber-optic measuring transducers. The sources of errors of fiber-optic measuring transducers are analyzed and methods for their reduction are given. The results of reflectometer-type optical fiber measuring transducer conversion function research and a technique for correcting measurement errors caused by various reflective properties of the ...

Added: November 28, 2019

LESS IS DOWN: корпусный анализ структуры метафорического значения глаголов падать и упасть

Kultepina O., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2020 Т. 1 № XVI С. 344–367

The paper raises an issue of possibilities that are provided by corpus-based approach in analysis of metaphorical transfer based on the aspectual pair upast’ / padat’ (‘to fall’). The author reviews the structure of metaphorical meaning of predicates that enforce the Lakoff’s metaphor ‘LESS IS DOWN’ and also analyses how collocations correlate with valency structure. ...

Added: October 7, 2019

Метод выделения коллокаций с использованием степенного показателя в распределении Ципфа

Klyshinskiy E., Kochetkova N. A., Карпик О. В., В кн.: Новые информационные технологии в автоматизированных системах: материалы двадцать первого научно-практического семинара.: М.: Институт прикладной математики им. М.В. Келдыша РАН, 2018. С. 220–225.

Для выделения из коллокаций текста мы предлагаем использовать степенной показатель распределения Ципфа. Для этого предлагается рассчитывать распределение Ципфа для фиксированного слова и его соседей. В статье проводится исследование получаемых результатов для таких пар как прилагательное+существительное, существительное+глагол и др. Предложенный метод сравнивается с результатами расчета меры MI. ...

Added: September 25, 2018

К определению французского sentiment (на материале романов Ретифа де ля Бретона)

Botchkarev A., В кн.: Франция и Россия: от средневековой имперсональности к личности Нового времени.: Н. Новгород: Радонеж, 2018. С. 156–164.

Рассматриваются способы и средства отображения франц. sentiment ‘чувство’ в романах Ретифа де ля Бретона. В исследовательском корпусе текстов под франц. sentiment подводятся самые разнообразные понятия: и состояние души, и ощущение, и убеждение, мнение и представление, и предрасположенность к определенным эмоциональным состояниям вроде привязанности, влюбленности, ревности и влечения. При этом сходным для всех засвидетельствованных словоупотреблений остается ...

Added: September 20, 2018

A Complex Approach to Spellchecking and Autocorrection for Russian

Dereza O., Fenogenova A., Kayutenko D. et al., , in: Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference “Dialogue” (2016).: М.: Изд-во РГГУ, 2016. P. 1–13.

This study discusses a number of methods that can be used jointly for error detection and correction, namely blacklists and pre-compiled dictionaries, a word2vec model, an N-gram language model and a tripartite error model. Our system consists of two standalone modules, an error detection confidence classifier, built with the help of supervised machine learning methods, ...

Added: June 20, 2018

Makinng the Right Moves in Teacher Development

Warsz.: ET Forum, 2012.

Added: April 18, 2017

Writing-for-publication: online pedagogy for post/graduate research writing.

Smirnova N. V., , in: Research literacies and writing pedagogies for masters and doctoral writersBook 31: Studies in Writing.: Netherlands: Brill, 2016. Ch. 4 P. 68–92.

The chapter overviews an approach to teaching writing-for-publication via an online pedagogy for post/graduate research writing. ...

Added: March 14, 2017

Коррекция нелинейности и гистерезиса функции преобразования индуктивных измерительных преобразователей перемещения

Yurin A., Неборский А. Ю., Датчики и системы 2016 № 11 С. 48–51

The features of operation of the inductive transducers. The algorithm of correction of non-linearity and hysteresis of conversion function of inductive transducers. A virtual instrument for the implementation of the proposed algorithm. Experimental research on the effectiveness of the proposed algorithm using a differential inductive transducer of linear displacement. ...

Added: December 13, 2016

RuSkELL: Online Language Learning Tool for Russian Language

Apresyan V., Baisa V., Buivolova O. et al., , in: Proceedings of the XVII EURALEX International Congress. Lexicography and Linguistic Diversity (6 – 10 September, 2016).: Tbilisi: Ivane Javakhishvili Tbilisi State University, 2016. P. 292–299.

RuSkELL ("Russian + Sketch Engine for Language Learning") is a new online resource intended for researchers and learners of Russian. It incorporates a specially pre-processed corpus and the interface which allows users to search for phrases in sentences, extract salient collocates and show similar words. The tool builds upon its English counterpart SkELL (Baisa & ...

Added: October 16, 2016