• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Exploration of register-dependent lexical semantics using word embeddings
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Exploration of register-dependent lexical semantics using word embeddings

P. 26–34.
Kutuzov A. B., Kuzmenko E., Marakasova A.

We present an approach to detect differences in lexical semantics across English language registers, using word embedding models from distributional semantics paradigm. Models trained on register-specific subcorpora of the BNC corpus are employed to compare lists of nearest associates for particular words and draw conclusions about their semantic shifts depending on register in which they are used. The models are evaluated on the task of register classification with the help of the deep inverse regression approach.

Additionally, we present a demo web service featuring most of the described models and allowing to explore word meanings in different English registers and to detect register affiliation for arbitrary texts. The code for the service can be easily adapted to any set of underlying models.

Language: English
Full text
Text on another site
Keywords: natural language processingавтоматическая обработка естественного языкаdigital humanitiescommunicative grammar, text structure, texts typology, fiction – non-fiction, register.digital humanitiesисследования жанраword2vecword embeddings

In book

Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)
Osaka: [б.и.], 2016.
Similar publications
Цифровое сопровождение гуманитарных образовательных программ
Kornienko S., Ismakaeva I., Senina A., Отечественная и зарубежная педагогика 2026 Т. 1 № 2(113) С. 91–102
In the digital age, digital proficiency is becoming a key literacy of the 21st century, particularly relevant for students in humanities education programs. This article proposes a comprehensive model for integrating digital technologies into humanities education at a university. The methodology relies on case studies and design-based research elements, including analysis of regulatory documents, educational ...
Added: April 30, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Открытые компьютерные инструменты для решения задач оцифровки и анализа русскоязычного текста в области Digital Humanities
Orekhov B., Цифровые гуманитарные исследования 2025 № 2 С. 71–83
В статье дается обзор не очень известных модулей, которые можно использовать для решения задач Digital Humanities, связанных с текстовым анализом и оцифровкой. К таким модулям отнесены те, которые облегчают оцифровку текстов, напечатанных в дореформенной орфографии (OCR-модель и конвертер в новую орфографию), акцентуатор, расставляющий ударения, детектор прямой речи, код, позволяющий оценить формульность фольклорного текста, конвертер для ...
Added: December 19, 2025
Digital Humanities and Literary Realism
Skorinkin D., Orekhov B., , in: The Oxford Handbook of Global Realisms.: Oxford: Oxford University Press, 2025. Ch. 10 P. 177–204.
This chapter investigates literary prose of the realist era in Russia using digital humanities methods. It focuses on how computational analysis can enhance an understanding of descriptions of literary characters, geographical locations, and lexical composition in literary texts. Using a corpus of more than five hundred texts (forty-six million word occurrences), it eschews the focus ...
Added: September 14, 2025
Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions
Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.
Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...
Added: September 4, 2025
Автоматическая саммаризация родительских чатов в WhatsApp
Dmitrieva K., Жолус М. Р., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Т. 23 № 1 С. 80–92
Automatic text summarization is one of the main tasks of natural language processing (NLP), which consists in creating a shorter version of the source text. In today’s world the amount of information consumed by people is constantly increasing, therefore more and more emphasis is being placed on the task of summarization. There are two main approaches ...
Added: July 8, 2025
Методы и средства извлечения терминов из текстов для терминологических задач
Bolshakova E. I., Семак В. В., Программные продукты и системы 2025 Т. 38 № 1 С. 5–16
The current state in the field of automatic term extraction from specialized natural language texts, including scientific and technical documents, is considered. Practical applications of methods and tools for extracting terms from texts include creation of terminological dictionaries, thesauri, and glossaries of problem oriented domains, as well as extraction of keywords and construction of subject ...
Added: July 2, 2025
Высокоуровневая семантическая интерпретация структуры статических моделей для русского языка
Serikov O., Ganeeva V., Аксенова А. А. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2023 Т. 21 № 1 С. 67–82
Since its inception, the Word2vec vector space has become a universal tool both for scientific and practical activities. Over time, it became clear that there is a lack of new methods for interpreting the location of words in vector spaces. The existing methods included consideration of analogies or clustering of a vector space. In recent ...
Added: April 28, 2025
Automation of Forensic Authorship Attribution: Problems and Prospects
Romanova T. V., Khomenko A., Legal Issues in the Digital Age 2022 Vol. 3 No. 2 P. 90–115
The article deals with validation of an integrative attribution algorithm based on the analysis of the author’s idiostyle using methods of interpretative linguistics with ob jectification of the available data with the help of mathematical statistics. The algo rithm addresses the identification problem of the attribution. The choice of parameters describing the individual style of ...
Added: March 12, 2025
Основы цифровой филологии: методы и принципы компьютерного анализа текста
Kazartsev (Evgenii Kazartcev) E., Пронин Д. Д., СПб.: Издательство "Политехника", 2024.
Учебник представляет собой уникальное издание, содержащее материал для обучения методам компьютерного анализа текстов, прежде всего, художественной литературы. Используются базы данных и корпусы, размещенные на цифровой платформе СОЦИОЛИТ, предназначенной для изучения взаимодействия литературы и общества. Представленные методы размыкают границы традиционной филологической науки, они позволяют проводить количественный и качественный анализ содержания и лексики текста в парадигме современной ...
Added: February 19, 2025
Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?
Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84
Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...
Added: January 7, 2025
Возможна ли цифровая история философии?
Alieva O., Историко-философский ежегодник 2024 Т. 39 С. 266–304
The article raises the question of the possibility of “digitalization” in the field of the historical and philosophical research. We first give a brief overview of the main genres of philosophical historiography and then examine the compatibility of these genres with some instruments of natural language processing. It is argued that methods of distributional semantics ...
Added: December 28, 2024
Цифровые гуманитарные проекты: learning DH by doing
Gomeniuk N. V., Ismakaeva I., В кн.: Будь в курсе цифровых гуманитарных исследований.: Красноярск: Сибирский федеральный университет, 2024. С. 98–108.
Появление и развитие такой области, как цифровые гуманитарные науки (Digital Humanities), ставит перед университетами новые задачи по подготовке специалистов, обладающих не только глубокими знаниями в своей предметной области, но и владеющих современными цифровыми инструментами и методами. «Инфраструктурным» требованием к подготовке таких специалистов становится формирование у них проектного мышления и навыков проектной деятельности. Мы описываем опыт реализации ...
Added: December 3, 2024
Python для гуманитариев, или почему программированию невозможно научиться с первой попытки
Senina A., В кн.: Будь в курсе цифровых гуманитарных исследований.: Красноярск: Сибирский федеральный университет, 2024. С. 164–181.
Монография стала результатом Всероссийского семинара «Гуманитарная цифра в вузах: программы, курсы, компетенции». Собраны педагогические опыты, составляющие сегодня дидактическую основу цифровых гуманитарных наук. Предложенные читателю материалы посвящены широкому спектру направлений — с​амоопределению цифровых гуманитариев в современном университете, архитектурам магистратур и майноров, программам специальных и онлайн-­курсов, цифровым компетенциям и проектным практикам. Будет интересна широкому кругу преподавателей-­гуманитариев — историкам, филологам, лингвистам, философам, социологам, ...
Added: December 3, 2024
Как сделана цифровая история идей
Alieva O., В кн.: Будь в курсе цифровых гуманитарных исследований.: Красноярск: Сибирский федеральный университет, 2024. С. 51–59.
Цифровая история идей — сравнительно молодое направление внутри Digital Humanities, использующее инструменты корпусной лингвистики в сочетании с методологией Кембриджской школы и Begriffsgeschichte. Как теоретические рамки, так и практические воплощения этого подхода нуждаются в осмыслении, которое должно показать, во-первых, целесообразность, а во-вторых, возможность его усвоения в российском образовательном и научном контексте. Оставляя теоретические вопросы для другого ...
Added: December 3, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit