• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Improving Distributional Semantic Models Using Anaphora Resolution during Linguistic Preprocessing

P. 288–300.
Kutuzov A. B., Козлова О. С.

In natural language processing, distributional semantic models are known as an efficient data driven approach to word and text representation, which allows computing meaning directly from large text corpora into word embeddings in a vector space. This paper addresses the role of linguistic preprocessing in enhancing performance of distributional models, and particularly studies pronominal anaphora resolution as a way to exploit more co-occurrence data without directly increasing the size of the training corpus.
We replace three different types of anaphoric pronouns with their antecedents in the training corpus and evaluate the extent to which this affects the performance of the resulting models in lexical similarity tasks. CBOW and SkipGram distributed models trained on Russian National Corpus are in the focus of our research, although the results are potentially applicable to other distributional semantic frameworks and languages as well. The trained models are evaluated against RUSSE '15 and SimLex-999 gold standard data sets. As a result, we find that models trained on corpora with pronominal anaphora resolved perform significantly better than their counterparts trained on baseline corpora.

Language: English
Full text
Text on another site
Keywords: natural language processingавтоматическая обработка естественного языкаискусственные нейронные сетианафораanaphora resolutiondistributional semanticsсемантическая близостьsemantic similarity of wordsдистрибутивная семантикаword2vecneural embeddingsvector space modelword2vecword embeddingsразрешение анафорывекторные репрезентации лексики

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)
Вып. 15. , М.: Изд-во РГГУ, 2016.
Similar publications
Аналитический обзор методов автоматического распознавания вовлеченности пользователя в виртуальную коммуникацию
Dvoynikova A., Кагиров И., Карпов А. А., Информационно-управляющие системы 2022 № 5 (120) С. 12–22
Введение: решение автоматическими средствами задачи распознавания и оценивания степени вовлеченности пользователя в процесс человеко-машинного взаимодействия или телекоммуникации является актуальным в области компьютерного распознавания состояний человека. Это необходимо для проектирования приложений дистанционного обучения, бизнеса и развлечений. Цель: провести сравнительный анализ существующего информационного обеспечения и методов в области автоматического распознавания и оценивания вовлеченности пользователя в процесс человеко-машинного ...
Added: April 24, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions
Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.
Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...
Added: September 4, 2025
Автоматическая саммаризация родительских чатов в WhatsApp
Dmitrieva K., Жолус М. Р., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Т. 23 № 1 С. 80–92
Automatic text summarization is one of the main tasks of natural language processing (NLP), which consists in creating a shorter version of the source text. In today’s world the amount of information consumed by people is constantly increasing, therefore more and more emphasis is being placed on the task of summarization. There are two main approaches ...
Added: July 8, 2025
Методы и средства извлечения терминов из текстов для терминологических задач
Bolshakova E. I., Семак В. В., Программные продукты и системы 2025 Т. 38 № 1 С. 5–16
The current state in the field of automatic term extraction from specialized natural language texts, including scientific and technical documents, is considered. Practical applications of methods and tools for extracting terms from texts include creation of terminological dictionaries, thesauri, and glossaries of problem oriented domains, as well as extraction of keywords and construction of subject ...
Added: July 2, 2025
Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта
Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74
Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...
Added: June 5, 2025
Eye-tracking study of unambiguous anaphora resolution in Russian
Tuzhik O., Khanova A., Kudryavtsev S. et al., , in: 12th Novi Sad workshop on Psycholinguistic, neurolinguistic and clinical linguistic research.: Novi Sad: Faculty of Philosophy, University of Novi Sad, 2025. P. 26–28.
Added: April 28, 2025
Высокоуровневая семантическая интерпретация структуры статических моделей для русского языка
Serikov O., Ganeeva V., Аксенова А. А. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2023 Т. 21 № 1 С. 67–82
Since its inception, the Word2vec vector space has become a universal tool both for scientific and practical activities. Over time, it became clear that there is a lack of new methods for interpreting the location of words in vector spaces. The existing methods included consideration of analogies or clustering of a vector space. In recent ...
Added: April 28, 2025
Разрешение синтаксической местоименной анафоры в системе ЭТАП-3
Иншакова Е. С., В кн.: Сборник статей конференции "Информационные технологии и системы" (ИТиС'16).: М.: ИППИ РАН, 2016.
В данной статье идет речь о правилах установления антецедентов для двух классов анафорических выражений: возвратных местоимений и местоимений 3 лица. ...
Added: April 10, 2025
An anaphora resolution system for Russian based on ETAP-4 linguistic processor
Inshakova E.S., , in: Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)Issue 18.: M.: Russian State University for the Humanitie, 2019.
Added: April 10, 2025
Automation of Forensic Authorship Attribution: Problems and Prospects
Romanova T. V., Khomenko A., Legal Issues in the Digital Age 2022 Vol. 3 No. 2 P. 90–115
The article deals with validation of an integrative attribution algorithm based on the analysis of the author’s idiostyle using methods of interpretative linguistics with ob jectification of the available data with the help of mathematical statistics. The algo rithm addresses the identification problem of the attribution. The choice of parameters describing the individual style of ...
Added: March 12, 2025
Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?
Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84
Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...
Added: January 7, 2025
Fear and Loathing in Russian Literature: A Case of Emotion Annotation of Short Stories of the 20th Century
Anna Moskvina, Margarita Kirina, , in: 27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534Vol. 2534.: Springer, 2025. P. 113–129.
The paper presents an investigation of the emotional aspect of the Russian short story of the 20th century. Our study is two-fold: firstly, we delve into emotional representation at the lexical level, building upon previous work on utilizing vector models to quantify emotional content. In this study, we introduce an annotated corpus where words are ...
Added: November 29, 2024
Automatic detection of grammatical aspect of Russian verbs based on their morphological properties
Petrunina U., Filip H., , in: Proceedings of the Fourth International Workshop on Resources and Tools for Derivational Morphology.: Dubrovnik: Croatian Language Technology Society, 2023.
Added: October 2, 2024
Cross-country analysis of science, technology and innovation policies: non-covid-19 related and Covid-19 specific STI policies in OECD countries
Russo M., Pavone P., Meissner D. et al., Quality and Quantity 2025 Vol. 59 No. Suppl 1 P. S343–S367
In OECD countries, Science, Technology and Innovation (STI) policies were seen as key aspects of coping with the Covid-19 pandemic. Now that the pandemic is over, identifying which policy mix portfolios characterised countries in terms of their non-Covid-19 related and Covid-19 specific STI policies fills a knowledge gap on changes in STI policies induced by ...
Added: September 27, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit