Breeds of cooccurrence: an attempt at classification

Roytberg M.A.; Roytberg A.M.; Khachko D. V.

?

Breeds of cooccurrence: an attempt at classification

P. 568–578.

Roytberg M.A., Roytberg A.M., Khachko D. V.

The paper proposes a substantial classification of collocates (pairs of words that tend to cooccur) along with heuristics that can help to attibute a word pair to a proper type automatically.

The best studied type is frequent phrases, which includes idioms, lexicographic collocations, and syntactic selection. Pairs of this type are known to occur at a short distance and can be singled out by choosing a narrow window for collecting cooccurrence data.

The next most salient type is topically related pairs. These can be identified by considering word frequencies in individual documents, as in the wellknown distributional topic models.

The third type is pairs that occur in repeated text fragments such as popular quotes of standard legal formulae. The characteristic feature of these is that the fragment contains several aligned words that are repeated in the same sequence. Such pairs are normally filtered out for most practical purposes, but filtering is usually applied only to exact repeats; we propose a method of capturing inexact repetition.

Hypothetically one could also expect to find a forth type, collocate pairs linked by an intrinsic semantic relation or a long-distance syntactic relation; such a link would guarantee co-occurrence at a certain relatively restricted range of distances, a range narrower than in case of a purely topical connection, but not so narrow as in repeats. However we do not find many cases of this sort in the preliminary empirical study.

Language: English

Full text

Keywords: сочетаемость collocations Topic Models коллокации cooccurrence repeats тематические модели повторы

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.

Т. 1: Основная программа конференции. Вып. 12 (19). , М.: РГГУ, 2013.

Лексический повтор как ресурс речевого воздействия в дискурсе испанского монарха Филиппа VI

Селиванова И. В., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2024 Т. 22 № 3 С. 60–72

Repetitions in public political discourse are necessary to make a text more coherent, facilitate its comprehension and expand its significance. The article examines lexical repetitions as one of the most effective means of persuasion in Felipe VI’s public discourse and provides their classification in accordance with the semantics of repeated elements, syntactic macro context and ...

Added: December 13, 2024

Медиаконцепт «вакцинация» в дискурсе немецких СМИ во время пандемии COVID-19

Balakina Y. V., Вестник Томского государственного университета 2024 № 509 С. 23–34

The relevance of the research is justified by the influence of the media on the consciousness and behavior of people during the crisis, allowing to form discursive phenomena that have specific characteristics. In addition, it seems particularly relevant to use linguistic tools to describe media and political phenomena, as well as to apply media and ...

Added: December 12, 2024

Запутывать мозги и ездить на шее: корпусное исследование функционирования фразеологизированных коллокаций в устном повседневном общении

Попова Т. И., Драчева К. И., В кн.: Дискурсивные практики в цифровую эпоху: традиции и инновации.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024. С. 208–217.

Статья посвящена описанию устойчивых неоднословных единиц (УНЕ) русской устной разговорной речи. Наблюдения и выводы основаны на анализе материала двух корпусов: подкорпуса русского языка повседневного общения «Один речевой день» (ОРД) общим объемом 300 тысяч словоупотреблений (195 эпизодов), Устного корпуса Национального корпуса русского языка (360 словоупотреблений) и корпуса «Социальные сети» (2615 словоупотреблений). В исследовании более подробно рассматриваются фразеологизированные коллокации ...

Added: October 29, 2024

Exploring collocational complexity in L2 Russian: A corpus-driven contrastive analysis

Kopotev M., Klimov A., Kisselev O., International Journal of Bilingualism 2025 Vol. 29 No. 2 P. 439–455

Objective: The objective of this article is to discuss the pedagogical and practical need for automated assessment tools that enable teachers, researchers, and other language practitioners to relatively quickly and automatically assess the general proficiency of second language (L2) speakers according to a number of different linguistic parameters, specifically the use of collocations. Introduction: The Introduction discusses existing ...

Added: September 9, 2024

[Рец. на:] J. Bressem. Repetitions in gesture: A cognitive-linguistic and usage-based perspective. Berlin; Boston: De Gruyter Mouton, 2021.

Nikolaeva Y., Вопросы языкознания 2023 № 2 С. 157–166

Repetitions in co-speech gestures reflect grammatical meanings, primarily number (for combinations with noun groups or whole clauses) and aspectual (such as plurality, duration, reciprocal) when combined with verb groups ...

Added: December 24, 2023

Семантическое наполнение понятия «популизм» в английском языке (опыт лексикографического и корпусного анализа)

Gritsenko E., Галочкин А. Е., Вопросы лексикографии 2023 № 27 С. 29–46

The aim of the article is to reveal the semantic content of the concept “populism” in modern English. The need to address this topic is driven by the fact that a significant part of the research is dedicated to the analysis of specific forms of populism or populist parties in the aspect of political science, discourse theory, political rhetoric, ...

Added: May 6, 2023

Плеонастические причастия в современной русской речи: функции и тенденции развития

Ю. М. Кувшинская, Н. А. Зевахина, Acta Linguistica Petropolitana. Труды института лингвистических исследований 2023 Т. 19 № 1 С. 138–192

The paper studies tendencies in the use of full single (i.e. without their arguments) redundant participles in the attributive position in the Russian written discourse. Relying upon the data of the Russian National Corpus and the Corpus of Russian Student Texts, as well as a number of the examples collected from various written sources, the ...

Added: December 8, 2022

Terminology of Migration Studies: A Corpus Analysis of Research Papers in Social Sciences

Elizaveta Smirnova, Tatiana Permyakova, Migration Letters 2022 Vol. 19 No. 4 P. 401–412

Migration studies is a new, rapidly developing research area whose terminology is being established at the intersection of various social sciences. This article undertakes a quantitative and qualitative analysis of terms associated with migration, conducted on a 281,000-word corpus of research articles in social sciences, published in leading academic journals. Our analysis involves corpus processing ...

Added: August 1, 2022

Дискурсы в агитационных материалах «красных» и «белых» периодических изданий пермской губернии в годы Гражданской войны

Ехлакова А. Р., Ismakaeva I., В кн.: Пятая зимняя школа по гуманитарной информатике.: Калининград: Балтийский федеральный университет им. Иммануила Канта, 2021. С. 20–26.

Анализируются наиболее часто встречающиеся словоформы в агитационных материалах публикаций «красных» и «белых» периодических изданий Пермской губернии в годы Гражданской войны. Применение теории дискурса Э. Лакло и Ш. Муфф позволило рассмотреть периодику «красных» и «белых» как поле борьбы соответствующих дискурсов в формировании значений и понимании мира. На основе инструментария программы AntConc (N-gram, Collacates) выделены наиболее часто ...

Added: February 17, 2022

Когнитивная обработка биномиалов русского языка тюркско-русскими билингвами

Буб А. С., Artemenko E., Язык и культура 2019 № 48 С. 32–45

The article concerns one of the aspects of bilingualism, namely the study of cognitive processing of lexical units in bilinguals. As a review of the scientific literature shows, the bilingual mental lexicon differs from the monolingual mental lexicon. In the latter, words do not exist separately, but together with colocational links, i.e. in conjunction with ...

Added: October 29, 2021

Extraction of Typical Client Requests from Bank Chat Logs

Pronoza E., Pronoza A., Yagunova E., , in: Advances in Computational Intelligence (17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Guadalajara, Mexico, October 22–27, 2018, Proceedings, Part II)* 2. Vol. 11289.: Springer, 2018. P. 156–164.

In this paper we propose a simple but powerful method of extracting key client requests from bank chat logs. Many companies nowadays are interested in building a chat bot to optimize their business, and are ready to provide chat bot developers with large amounts of data, but such data often need special preparation to be ...

Added: October 30, 2020

In Search of Lost Collocations: Combining Measures to Reach the Top Range

Khohlova M., Klyshinskiy E., , in: Internet and Modern Society: Proceedings of the International Conference IMS-2017.: NY: ACM Press, 2017. P. 160–163.

The paper discusses statistical methods for collocation extraction. We test the following hypothesis: combining several methods gives a better result than applying just one. At the first stage we suggest two methods to combine MI and t-score rankings and evaluate the results on attributive and verbal collocations against the data attested in the dictionary. At the second stage, we use regression ...

Added: October 28, 2020

Evaluation of collocation extraction methods for the Russian language

Pivovarorva L., Kormacheva D., Kopotev M., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 137–157.

This paper focuses on empirical collocations, understood here as word co-occurrences that 1) are frequent enough to be extracted automatically and 2) may be semantically and/or syntactically bounded to various extents. Our main goal is to examine closely five window-based methods for empirical collocation extractions that are widely used in corpus-based studies, sometimes without proven ...

Added: September 30, 2020

Collocations and near-native competence: Lexical strategies of heritage speakers of Russian

Kopotev M., Polinsky M., Kisselev O., International Journal of Bilingualism 2020 P. 1–28

This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian heritage speakers from three different dominant-language backgrounds, namely German, Finnish, and American English. The elicited narratives are based on ...

Added: September 30, 2020

Неуклюжая сага: повторы и неоднородная композиция в «сагах об исландцах»

Daria G., Вестник РГГУ. Серия «Литературоведение. Языкознание. Культурология». 2021 № 1 С. 58–72

В статье рассматривается несколько саг об исландцах (Íslendingasögur), нарратив которых целиком или частично построен на композиционном повторе: повторении последовательности из нескольких мотивов. Приводится анализ таких повторов и выделяется их функция в композиции саги в зависимости от особенностей повествования – причинно-следственного нарратива или эпизодического. Так, если в эпизодическом повествовании композиционный повтор помогает составителю саги организовать материал, ...

Added: September 28, 2020

Data-Driven Approach To Patient Flow Management And Resource Utilization In Urban Medical Facilities

Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Fomichev N. et al., , in: 2020 IEEE 22nd Conference on Business Informatics (CBI).: IEEE, 2020. P. 71–77.

Healthcare services are tightly connected with complex data analysis techniques to enable optimal resource allocation in medical institutions. This paper proposes a detailed analysis of incoming patient flow to local polyclinic by integrating clustering techniques, process mining and a concept of self-organizing systems. The study takes into account concepts based on models of managing social ...

Added: August 31, 2020

О чувстве уважения в русском языковом сознании: уважения достойно…

Botchkarev A., Slavica Slovaca 2020 Т. 55 № 1 С. 46–52

The article explores the ways of displaying uvazheniye ‘respect’ in the Russian language consciousness. The National Russian Corpus is more appropriate for this purpose, because a conceptual configuration of an analyzed concept is not present in a “finished” form in any single utterance, but may be reconstructed on the totality of all possible utterances. According ...

Added: June 24, 2020