• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Breeds of cooccurrence: an attempt at classification
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Breeds of cooccurrence: an attempt at classification

P. 568–578.
Roytberg M.A., Roytberg A.M., Khachko D. V.

The paper proposes a substantial classification of collocates (pairs of words that tend to cooccur) along with heuristics that can help to attibute a word pair to a proper type automatically.

The best studied type is frequent phrases, which includes idioms, lexicographic collocations, and syntactic selection. Pairs of this type are known to occur at a short distance and can be singled out by choosing a narrow window for collecting cooccurrence data.

The next most salient type is topically related pairs. These can be identified by considering word frequencies in individual documents, as in the wellknown distributional topic models.

The third type is pairs that occur in repeated text fragments such as popular quotes of standard legal formulae. The characteristic feature of these is that the fragment contains several aligned words that are repeated in the same sequence. Such pairs are normally filtered out for most practical purposes, but filtering is usually applied only to exact repeats; we propose a method of capturing inexact repetition.

Hypothetically one could also expect to find a forth type, collocate pairs linked by an intrinsic semantic relation or a long-distance syntactic relation; such a link would guarantee co-occurrence at a certain relatively restricted range of distances, a range narrower than in case of a purely topical connection, but not so narrow as in repeats. However we do not find many cases of this sort in the preliminary empirical study.

Language: English
Full text
Keywords: сочетаемостьcollocationsTopic Modelsколлокацииcooccurrencerepeatsтематические моделиповторы

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.
Т. 1: Основная программа конференции. Вып. 12 (19). , М.: РГГУ, 2013.
Similar publications
Лексический повтор как ресурс речевого воздействия в дискурсе испанского монарха Филиппа VI
Селиванова И. В., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2024 Т. 22 № 3 С. 60–72
Repetitions in public political discourse are necessary to make a text more coherent, facilitate its comprehension and expand its significance. The article examines lexical repetitions as one of the most effective means of persuasion in Felipe VI’s public discourse and provides their classification in accordance with the semantics of repeated elements, syntactic macro context and ...
Added: December 13, 2024
Медиаконцепт «вакцинация» в дискурсе немецких СМИ во время пандемии COVID-19
Balakina Y. V., Вестник Томского государственного университета 2024 № 509 С. 23–34
The relevance of the research is justified by the influence of the media on the consciousness and behavior of people during the crisis, allowing to form discursive phenomena that have specific characteristics. In addition, it seems particularly relevant to use linguistic tools to describe media and political phenomena, as well as to apply media and ...
Added: December 12, 2024
Запутывать мозги и ездить на шее: корпусное исследование функционирования фразеологизированных коллокаций в устном повседневном общении
Попова Т. И., Драчева К. И., В кн.: Дискурсивные практики в цифровую эпоху: традиции и инновации.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024. С. 208–217.
Статья посвящена описанию устойчивых неоднословных единиц (УНЕ) русской устной разговорной речи. Наблюдения и выводы основаны на анализе материала двух корпусов: подкорпуса русского языка повседневного общения «Один речевой день» (ОРД) общим объемом 300 тысяч словоупотреблений (195 эпизодов), Устного корпуса Национального корпуса русского языка (360 словоупотреблений) и корпуса «Социальные сети» (2615 словоупотреблений). В исследовании более подробно рассматриваются фразеологизированные коллокации ...
Added: October 29, 2024
Exploring collocational complexity in L2 Russian: A corpus-driven contrastive analysis
Kopotev M., Klimov A., Kisselev O., International Journal of Bilingualism 2025 Vol. 29 No. 2 P. 439–455
Objective: The objective of this article is to discuss the pedagogical and practical need for automated assessment tools that enable teachers, researchers, and other language practitioners to relatively quickly and automatically assess the general proficiency of second language (L2) speakers according to a number of different linguistic parameters, specifically the use of collocations. Introduction: The Introduction discusses existing ...
Added: September 9, 2024
[Рец. на:] J. Bressem. Repetitions in gesture: A cognitive-linguistic and usage-based perspective. Berlin; Boston: De Gruyter Mouton, 2021.
Nikolaeva Y., Вопросы языкознания 2023 № 2 С. 157–166
Repetitions in co-speech gestures reflect grammatical meanings, primarily number (for combinations with noun groups or whole clauses) and aspectual (such as plurality, duration, reciprocal) when combined with verb groups ...
Added: December 24, 2023
Семантическое наполнение понятия «популизм» в английском языке (опыт лексикографического и корпусного анализа)
Gritsenko E., Галочкин А. Е., Вопросы лексикографии 2023 № 27 С. 29–46
The aim of the article is to reveal the semantic content of the concept “populism” in modern English. The need to address this topic is driven by the fact that a significant part of the research is dedicated to the analysis of specific forms of populism or populist parties in the aspect of political science, discourse theory, political rhetoric, ...
Added: May 6, 2023
Плеонастические причастия в современной русской речи: функции и тенденции развития
Ю. М. Кувшинская, Н. А. Зевахина, Acta Linguistica Petropolitana. Труды института лингвистических исследований 2023 Т. 19 № 1 С. 138–192
The paper studies tendencies in the use of full single (i.e. without their arguments)  redundant participles in the attributive position in the Russian written discourse. Relying upon the data of the Russian National Corpus and the Corpus of Russian Student Texts, as well as a number of the examples collected from various written sources, the ...
Added: December 8, 2022
Terminology of Migration Studies: A Corpus Analysis of Research Papers in Social Sciences
Elizaveta Smirnova, Tatiana Permyakova, Migration Letters 2022 Vol. 19 No. 4 P. 401–412
Migration studies is a new, rapidly developing research area whose terminology is being established at the intersection of various social sciences. This article undertakes a quantitative and qualitative analysis of terms associated with migration, conducted on a 281,000-word corpus of research articles in social sciences, published in leading academic journals. Our analysis involves corpus processing ...
Added: August 1, 2022
Дискурсы в агитационных материалах «красных» и «белых» периодических изданий пермской губернии в годы Гражданской войны
Ехлакова А. Р., Ismakaeva I., В кн.: Пятая зимняя школа по гуманитарной информатике.: Калининград: Балтийский федеральный университет им. Иммануила Канта, 2021. С. 20–26.
Анализируются наиболее часто встречающиеся словоформы в агитационных материалах публикаций «красных» и «белых» периодических изданий Пермской губернии в годы Гражданской войны. Применение теории дискурса Э. Лакло и Ш. Муфф позволило рассмотреть периодику «красных» и «белых» как поле борьбы соответствующих дискурсов в формировании значений и понимании мира. На основе инструментария программы AntConc (N-gram, Collacates) выделены наиболее часто ...
Added: February 17, 2022
Когнитивная обработка биномиалов русского языка тюркско-русскими билингвами
Буб А. С., Artemenko E., Язык и культура 2019 № 48 С. 32–45
The article concerns one of the aspects of bilingualism, namely the study of cognitive processing of lexical units in bilinguals. As a review of the scientific literature shows, the bilingual mental lexicon differs from the monolingual mental lexicon. In the latter, words do not exist separately, but together with colocational links, i.e. in conjunction with ...
Added: October 29, 2021
Extraction of Typical Client Requests from Bank Chat Logs
Pronoza E., Pronoza A., Yagunova E., , in: Advances in Computational Intelligence (17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Guadalajara, Mexico, October 22–27, 2018, Proceedings, Part II)* 2. Vol. 11289.: Springer, 2018. P. 156–164.
In this paper we propose a simple but powerful method of extracting key client requests from bank chat logs. Many companies nowadays are interested in building a chat bot to optimize their business, and are ready to provide chat bot developers with large amounts of data, but such data often need special preparation to be ...
Added: October 30, 2020
In Search of Lost Collocations: Combining Measures to Reach the Top Range
Khohlova M., Klyshinskiy E., , in: Internet and Modern Society: Proceedings of the International Conference IMS-2017.: NY: ACM Press, 2017. P. 160–163.
The paper discusses statistical methods for collocation extraction. We test the following hypothesis: combining several methods gives a better result than applying just one. At the first stage we suggest two methods to combine MI and t-score rankings and evaluate the results on attributive and verbal collocations against the data attested in the dictionary. At the second stage, we use regression ...
Added: October 28, 2020
Evaluation of collocation extraction methods for the Russian language
Pivovarorva L., Kormacheva D., Kopotev M., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 137–157.
This paper focuses on empirical collocations, understood here as word co-occurrences that 1) are frequent enough to be extracted automatically and 2) may be semantically and/or syntactically bounded to various extents. Our main goal is to examine closely five window-based methods for empirical collocation extractions that are widely used in corpus-based studies, sometimes without proven ...
Added: September 30, 2020
Collocations and near-native competence: Lexical strategies of heritage speakers of Russian
Kopotev M., Polinsky M., Kisselev O., International Journal of Bilingualism 2020 P. 1–28
This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian heritage speakers from three different dominant-language backgrounds, namely German, Finnish, and American English. The elicited narratives are based on ...
Added: September 30, 2020
Неуклюжая сага: повторы и неоднородная композиция в «сагах об исландцах»
Daria G., Вестник РГГУ. Серия «Литературоведение. Языкознание. Культурология». 2021 № 1 С. 58–72
В статье рассматривается несколько саг об исландцах (Íslendingasögur), нарратив которых целиком или частично построен на композиционном повторе: повторении последовательности из нескольких мотивов. Приводится анализ таких повторов и выделяется их функция в композиции саги в зависимости от особенностей повествования – причинно-следственного нарратива или эпизодического. Так, если в эпизодическом повествовании композиционный повтор помогает составителю саги организовать материал, ...
Added: September 28, 2020
Data-Driven Approach To Patient Flow Management And Resource Utilization In Urban Medical Facilities
Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Fomichev N. et al., , in: 2020 IEEE 22nd Conference on Business Informatics (CBI).: IEEE, 2020. P. 71–77.
Healthcare services are tightly connected with complex data analysis techniques to enable optimal resource allocation in medical institutions. This paper proposes a detailed analysis of incoming patient flow to local polyclinic by integrating clustering techniques, process mining and a concept of self-organizing systems. The study takes into account concepts based on models of managing social ...
Added: August 31, 2020
О чувстве уважения в русском языковом сознании: уважения достойно…
Botchkarev A., Slavica Slovaca 2020 Т. 55 № 1 С. 46–52
The article explores the ways of displaying uvazheniye ‘respect’ in the Russian language consciousness. The National Russian Corpus is more appropriate for this purpose, because a conceptual configuration of an analyzed concept is not present in a “finished” form in any single utterance, but may be reconstructed on the totality of all possible utterances. According ...
Added: June 24, 2020
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit