• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Evaluation of collocation extraction methods for the Russian language
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Evaluation of collocation extraction methods for the Russian language

P. 137–157.
Pivovarorva L., Kormacheva D., Kopotev M.

This paper focuses on empirical collocations, understood here as word co-occurrences that 1) are frequent enough to be extracted automatically and 2) may be semantically and/or syntactically bounded to various extents. Our main goal is to examine closely five window-based methods for empirical collocation extractions that are widely used in corpus-based studies, sometimes without proven efficiency. Our study evaluates the methods’ reliability for Russian data by testing two hypotheses: a) collocations listed in a professionally compiled dictionary (i.e., those considered fixed to some extent by experts in the field) should have higher rankings in automatically extracted lists of collocations, and b) collocations considered fixed expressions by native speakers should have higher rankings in automatically generated lists. Our research indicates that raw frequency, t-score, log-likelihood, and Dice give the best rankings, while MI and wFR demonstrate poorer results in both evaluations. In general, all of these evaluations, although each has its own limitations, lead to equatable results, which should be taken into account in future research.

Language: English
Text on another site
Keywords: коллокацииколичественные методыcollocation extraction method, evaluation, frequency, t-score, log-likelihood, Dice MI, wFRt-score, log-likelihood, Dice MI, wFR

In book

Quantitative approaches to the Russian language
Quantitative approaches to the Russian language
Abingdon: Routledge, 2018.
Similar publications
Целевые каузальные эффекты в социальных исследованиях
Sokolov B., Социология: методология, методы, математическое моделирование 2025 № 61 С. 7–76
This article reviews a set of estimands commonly used in modern applied research to operationalize causal inquiries within the Rubin Causal Model (RCM). I first introduce the basic average treatment effects (ATE, ATT, ATC) and then describe their main extensions, including local and conditional treatment effects, causal interactions, causal mediation, multivalued or continuous treatments, and ...
Added: December 19, 2025
«Социальное пространство» П. Бурдьё: история конструирования понятия
Shmatko N., Маркова Ю. В., Социологический журнал 2025 Т. 31 № 1 С. 110–123
The article deals with the history and interpretation of Pierre Bourdieu’s concept of “social space”. With the help of the concept, Bourdieu described a set of interrelated social phenomena that support and reflect each other. He defined social space as a multidimensional distribution of agents (individual or collective) over objective positions determined by the distribution of effective resources ...
Added: May 23, 2025
Медиаконцепт «вакцинация» в дискурсе немецких СМИ во время пандемии COVID-19
Balakina Y. V., Вестник Томского государственного университета 2024 № 509 С. 23–34
The relevance of the research is justified by the influence of the media on the consciousness and behavior of people during the crisis, allowing to form discursive phenomena that have specific characteristics. In addition, it seems particularly relevant to use linguistic tools to describe media and political phenomena, as well as to apply media and ...
Added: December 12, 2024
Запутывать мозги и ездить на шее: корпусное исследование функционирования фразеологизированных коллокаций в устном повседневном общении
Попова Т. И., Драчева К. И., В кн.: Дискурсивные практики в цифровую эпоху: традиции и инновации.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024. С. 208–217.
Статья посвящена описанию устойчивых неоднословных единиц (УНЕ) русской устной разговорной речи. Наблюдения и выводы основаны на анализе материала двух корпусов: подкорпуса русского языка повседневного общения «Один речевой день» (ОРД) общим объемом 300 тысяч словоупотреблений (195 эпизодов), Устного корпуса Национального корпуса русского языка (360 словоупотреблений) и корпуса «Социальные сети» (2615 словоупотреблений). В исследовании более подробно рассматриваются фразеологизированные коллокации ...
Added: October 29, 2024
Эмпирические вызовы и методологические подходы в сравнительной политологии (сквозь призму “Политического атласа современного мира 2.0”)
Melville A. Y., Мальгин А. В., Mironyuk M. et al., Полис. Политические исследования 2023 № 5 С. 153–171
In recent decades, the expanding volume, diversity and coverage of data have created new or have transformed existing areas of research. They have also turned data into a key element of politics today. In this context, the status of empirical research that became the political science mainstream at the turn of the 20th - 21st ...
Added: September 29, 2023
Семантическое наполнение понятия «популизм» в английском языке (опыт лексикографического и корпусного анализа)
Gritsenko E., Галочкин А. Е., Вопросы лексикографии 2023 № 27 С. 29–46
The aim of the article is to reveal the semantic content of the concept “populism” in modern English. The need to address this topic is driven by the fact that a significant part of the research is dedicated to the analysis of specific forms of populism or populist parties in the aspect of political science, discourse theory, political rhetoric, ...
Added: May 6, 2023
Плеонастические причастия в современной русской речи: функции и тенденции развития
Ю. М. Кувшинская, Н. А. Зевахина, Acta Linguistica Petropolitana. Труды института лингвистических исследований 2023 Т. 19 № 1 С. 138–192
The paper studies tendencies in the use of full single (i.e. without their arguments)  redundant participles in the attributive position in the Russian written discourse. Relying upon the data of the Russian National Corpus and the Corpus of Russian Student Texts, as well as a number of the examples collected from various written sources, the ...
Added: December 8, 2022
Количественная оценка перекрестных сетевых эффектов для нетрансакционных платформ
Рожкина В. С., Golovanova S., Korneeva D., Вестник Московского университета. Серия 6: Экономика 2022 № 4 С. 17–38
The analysis of cross-network effects is important for considering the impossibility of their direct observation and the influence of cross-network effects on the values of all tests in competition policy, pricing practice and merger valuation. The article summarizes the experience of quantifying cross-network effects for non-transactional platforms. This paper systematizes methods for assessing cross-network effects ...
Added: September 15, 2022
Дискурсы в агитационных материалах «красных» и «белых» периодических изданий пермской губернии в годы Гражданской войны
Ехлакова А. Р., Ismakaeva I., В кн.: Пятая зимняя школа по гуманитарной информатике.: Калининград: Балтийский федеральный университет им. Иммануила Канта, 2021. С. 20–26.
Анализируются наиболее часто встречающиеся словоформы в агитационных материалах публикаций «красных» и «белых» периодических изданий Пермской губернии в годы Гражданской войны. Применение теории дискурса Э. Лакло и Ш. Муфф позволило рассмотреть периодику «красных» и «белых» как поле борьбы соответствующих дискурсов в формировании значений и понимании мира. На основе инструментария программы AntConc (N-gram, Collacates) выделены наиболее часто ...
Added: February 17, 2022
Delta Берроуза для древнегреческих авторов: опыт применения
Alieva O., Schole. Философское антиковедение и классическая традиция 2022 Т. 16 № 2 С. 693–705
This paper tests the effectiveness of Burrow’s Delta Method on a corpus of selected prose writings in ancient Greek. When tested on a corpus of fourteen and eight authors, the method yields good results with relatively small samples (1000, 3000, and 5000 words) and different word frequency vectors (100, 200, 500 words), but its performance ...
Added: February 9, 2022
Когнитивная обработка биномиалов русского языка тюркско-русскими билингвами
Буб А. С., Artemenko E., Язык и культура 2019 № 48 С. 32–45
The article concerns one of the aspects of bilingualism, namely the study of cognitive processing of lexical units in bilinguals. As a review of the scientific literature shows, the bilingual mental lexicon differs from the monolingual mental lexicon. In the latter, words do not exist separately, but together with colocational links, i.e. in conjunction with ...
Added: October 29, 2021
О СОВРЕМЕННОСТИ «СОВРЕМЕННОГО СОСТОЯНИЯ ИЗУЧЕНИЯ ПОЛИТИКИ» КРУГЛЫЙ СТОЛ
Gaman-Golutvina O. V., Панов П. В., Filippov A. F., Полития: Анализ. Хроника. Прогноз 2021 № 1(100) С. 193–209
Added: April 12, 2021
Методы компаративных исследований
Gaman-Golutvina O. V., В кн.: Политическая компаративистика.: М.: Аспект Пресс, 2020. С. 85–104.
Added: April 12, 2021
Соотношение сил между великими державами в «Группе 20»: анализ при помощи метода многомерного шкалирования
Артюшкин В. Ф., Kazantsev A., Сергеев В. М., Полис. Политические исследования 2021 Т. 2 С. 125–138
. This article applies a method of multidimensional scaling (visualization of multi-dimensional structures) to studying different dimensions of power competition between the great states. On the basis of analysis of the Neo-Realist, Neo-Liberal, and World-systems theory literature on global hegemony, 8 criteria of global leadership were defined: GDP per capita (PPP), military expenditure (% of ...
Added: February 8, 2021
Collocations and near-native competence: Lexical strategies of heritage speakers of Russian
Kopotev M., Polinsky M., Kisselev O., International Journal of Bilingualism 2020 P. 1–28
This paper presents an exploratory study on the use of frequency-based probabilistic word combinations in Heritage Russian. The data used in the study are drawn from three small corpora of narratives, representing the language of Russian heritage speakers from three different dominant-language backgrounds, namely German, Finnish, and American English. The elicited narratives are based on ...
Added: September 30, 2020
О чувстве уважения в русском языковом сознании: уважения достойно…
Botchkarev A., Slavica Slovaca 2020 Т. 55 № 1 С. 46–52
The article explores the ways of displaying uvazheniye ‘respect’ in the Russian language consciousness. The National Russian Corpus is more appropriate for this purpose, because a conceptual configuration of an analyzed concept is not present in a “finished” form in any single utterance, but may be reconstructed on the totality of all possible utterances. According ...
Added: June 24, 2020
Журналы земских собраний: организация информации на основе информационных систем (на примере Пермской губернии)
Kornienko S., Ехлакова А. Р., В кн.: Сборники Президентской библиотекиВып. 8: Цифровые проекты в современной информационной среде: наука и практика.: СПб.: Президентская библиотека имени Б.Н. Ельцина, 2018. С. 70–83.
Анализируются возможности использования информационных систем и количественных методов для изучения журналов земских собрании как исторического источника. Приведена характеристика журналов собраний как одного из основных делопроизводственных источников земских учреждений, охарактеризованы информационные системы, созданные в Центре цифровой гуманитаристики Пермского государственного национального исследовательского университета. На основе информационных систем проанализированы результаты организации информации в журналах земских собраний, получены количественные ...
Added: October 20, 2019
LESS IS DOWN: корпусный анализ структуры метафорического значения глаголов падать и упасть
Kultepina O., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2020 Т. 1 № XVI С. 344–367
The paper raises an issue of possibilities that are provided by corpus-based approach in analysis of metaphorical transfer based on the aspectual pair upast’ / padat’ (‘to fall’). The author reviews the structure of metaphorical meaning of predicates  that enforce the Lakoff’s metaphor ‘LESS IS DOWN’ and also analyses how collocations correlate with valency structure. ...
Added: October 7, 2019
Метр отрезков длиннее строки в башкирском силлабическом стихе
Orekhov B., Известия РАН. Серия литературы и языка 2019 Т. 78 № 2 С. 41–50
The paper considers a specific element of syllabic versification on the Bashkir text data. We examine the ordered alternations of lines of different lengths. Such verse forms exist in Turkic verse along with the usual isosyllabic poems. The status of such forms is ambiguous; they can be viewed both as a stanza and as a ...
Added: September 18, 2019
Специфические слова и выражения русских классиков XIX века: опыт контрастивного корпусного исследования
Orekhov B., Ученые записки Петрозаводского государственного университета. Серия: Общественные и гуманитарные науки 2019 № 5 С. 70–75
The paper presents the results of a quantitative study that identifies characteristic and specific low-frequency words for the prose of Russian classic writers of the XIX century. TF-IDF measure and a large collection of the XIX century texts by Turgenev, Goncharov, Leskov and Dostoevsky were used to identify words and phrases that are rarely found ...
Added: September 18, 2019
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit