Redefining part-of-speech classes with distributional semantic models

A. B. Kutuzov; Velldal E.; Øvrelid L.

?

Redefining part-of-speech classes with distributional semantic models

P. 115–125.

Kutuzov A. B., Velldal E., Øvrelid L.

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliation contained in the distributional vectors allows us to discover groups of words with distributional patterns that differ from other words of the same part of speech. This data often reveals hidden inconsistencies of the annotation process or guidelines. At the same time, it supports the notion of ‘soft’ or ‘graded’ part of speech affiliations. Finally, we show that information about PoS is distributed among dozens of vector components, not limited to only one or two features.

Language: English

Full text

Text on another site

Keywords: машинное обучение natural language processing автоматическая обработка естественного языка machine learning distributional semantics part of speech tagging части речи дистрибутивная семантика word2vec word embeddings

In book

Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

Berlin: Association for Computational Linguistics, 2016.

Опыт генерации оценок эмоциональной валентности и возбуждения слов на основе символьно-уровневой CNN

Lyusin D., Валуева Е. А., Sysoeva T., В кн.: Психология познания: Материалы Всероссийской научной конференции, ЯрГУ, Институт психологии РАН, 5–6 декабря 2025 г.: Институт психологии РАН, 2026. С. 310–314.

Эмоциональная окраска слов широко используются в различных академических и прикладных исследованиях, от анализа текстов до понимания когнитивных процессов. Актуальной задачей является создание объёмных датасетов с оценками слов по ряду эмоциональных параметров. Современные методы машинного обучения, основанные на семантической близости слов, извлекаемой из текстовых корпусов, демонстрируют высокие корреляции с человеческими оценками, однако иногда наблюдаются существенные расхождения. ...

Added: April 10, 2026

Нейросетевые инструменты в арсенале вузовского преподавателя

Fedorov A., Вакку Г. В., Лебедева С. Э., Галактика медиа: журнал медиа исследований 2026 Т. 8 № 2 С. 163–182

With the increasing volume of data, university faculty may spend years processing and organizing information. Personalized assistance, content recommendations, data collection for literature reviews, and bibliographic citation formatting reinforce the role of artificial intelligence and neural network tools for scholarly communication. This paper discusses practical examples of using tools such as Elicit, SciSpace, Consensus, Undermind, ...

Added: April 7, 2026

Применение ML в целях повышения помехоустойчивости сигналов

Efremov A., Portnoy S., Волошин А. Д., Первая миля 2025 № 8 С. 20–28

Выполнен комплексный обзор методов машинного обучения (ML), применяемых для повышения устойчивости сигнала к помехам в каналах связи. Бурное развитие поколений беспроводной связи, активная разработка концепции 6G предъявляют высокие требования к задержке, скорости и надежности передачи данных. Традиционные подходы к защите от помех, основанные на строгих аналитических моделях, зачастую не справляются с хаотичной природой плотных гетерогенных ...

Added: April 4, 2026

A Tool for Mass Generation of Random Step Environment Models with User-Defined Landscape Features

Gabdrahmanov R., Tsoy T., Martinez-Garcia E. et al., , in: Proceedings of the 21st International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024).: [б.и.], 2025. P. 511–518.

Computer simulations are growing in popularity in robotics research due to their near-zero cost of error and lower labor intensity. One of necessary components of a simulation, in addition to a robot model, is a model of a world in which the robot operates. While it is always possible to construct a world model manually, ...

Added: March 17, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modelling and User-Adapted Interaction 2026 Vol. 36 Article 2

Graph-based recommender systems have emerged as a powerful paradigm for personalized recommendations. However, their reliance on full model retraining to incorporate new users or new interactions creates scalability barriers. The task becomes infeasible in real-life recommender systems due to excessive time and resource costs involved. To address this limitation, we propose a fast and efficient ...

Added: March 15, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., User Modeling and User-Adapted Interaction 2025 P. 1–24

Added: March 14, 2026

Real-Bogus Classification for ZTF Data Releases: Two Approaches

Semenikhin T., Kornilov M., Pruzhinskaya M. et al., , in: 26th International Conference, DAMDID/RCDL 2024, Nizhny Novgorod, Russia, October 23–25, 2024, Revised Selected Papers. Data Analytics and Management in Data Intensive Domains. (CCIS, volume 2641).: Springer, 2026. P. 211–219.

We considered two fundamentally different approaches to real-bogus classification within the Zwicky Transient Facility survey data. The first approach is based on neural networks that take sequences of object images as input. The second approach uses features extracted from light curves and classical machine learning methods. Several models for both approaches were tested. Quality metrics ...

Added: March 11, 2026

Дискриминативная лемматизация сокращений в эпоху LLM

Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155

This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...

Added: March 10, 2026

Кластеризация паттернов потребления электроэнергии умного дома на основе ансамблевых методов машинного обучения

Maltseva S. V., Бериков В. Б., Кладов Д. Е. et al., В кн.: Информатика и прикладная математика: Материалы X Международной научно-практической конференции (08.10 - 11.10.2025 г.)Т. 1: Сборник материалов часть 1.: Алматы: Институт информационных и вычислительных технологий КН МНВО РК, 2025. С. 227–232.

This paper examines the problem of clustering consumption patterns for a private household. An ensemble algorithm based on the Wasserstein metric was developed and applied to cluster daily load profiles. The proposed approach allows for identifying typical energy consumption scenarios and interpreting consumer behavior. Results from computational experiments using real data are presented. ...

Added: March 3, 2026

Грамматический ландшафт художественной прозы: динамика частеречных распределений в русском рассказе XX века

Kirina M., В кн.: Русская грамматика: полипарадигмальность как методологический принцип современных научных исследований : материалы IX Международного научного симпозиума.: Издательство ИГУ, 2025. С. 270–275.

В статье представлены результаты пилотного исследования, направленного на описание дистрибуции частей речи в синхронии и диахронии на материале русской прозы малой формы. Рассматриваются изменения морфологического состава художественных текстов (на уровне грамматических классов) на протяжении XX века в соответствии с 9 историко-культурными периодами. Материалом исследования выступает выборка из 943 рассказов суммарным объемом более 3 млн. словоупотреблений. ...

Added: February 28, 2026

RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121

Added: February 25, 2026

Определение фолликулярного резерва яичников по данным ультразвукового исследования на основе методов машинного обучения

Moshkin A., Лапутин Ф. А., Сидоров И. В., DIGITAL DIAGNOSTICS 2024 Т. 5 № S1 С. 40–42

BACKGROUND: Ovarian reserve reflects a woman's ability to successfully realize reproductive function. The assessment of ovarian reserve is an urgent task for clinical practice [1] and is important in scientific research. The use of computerized diagnostic image processing methods can accelerate and facilitate the performance of routine tasks in clinical practice. Their use in retrospective ...

Added: February 21, 2026

Предсказание риска развития церебрального инсульта

Кузнецов В. А., Yasnitsky L., В кн.: Искусственный интеллект в решении актуальных социальных и экономических проблем ХХI века : Сборник статей по материалам Десятой всероссийской научно-практической конференции с международным участием (г. Пермь, ПГНИУ, 9–10 октября 2025 г.).: Пермский государственный национальный исследовательский университет, 2025. С. 240–247.

В работе представлены разработка и сравнительный анализ методов машинного обучения для задачи бинарной классификации пациентов с риском развития церебрального инсульта. Исследовательский процесс включал этап тщательного разведочного анализа данных, за которым последовала реализация и оценка трех моделей: дерева решений, случайного леса и нейронной сети. Целью работы является определение наиболее эффективного алгоритма для построения системы поддержки врачебных решений, способной своевременно ...

Added: February 15, 2026

Проблема рационализации и чрезмерного полагания на инструменты XAI: анализ объяснений больших языковых моделей

Suvorova A., В кн.: XXII национальная конференция по искусственному интеллекту с международным участием (КИИ-2025)Т. 1.: СПб.: Санкт-Петербургский Федеральный исследовательский центр РАН, 2025. С. 310–318.

В работе исследуется проблема чрезмерного полагания (overreliance) пользователей на результаты интерпретации моделей машинного обучения, а также способов ее решения с помощью пояснений, генерируемых большими языковыми моделями (LLM). Результаты эксперимента показали, что большинство моделей, так же как и пользователи-люди в исходном эксперименте, игнорировали аномалии или предлагали правдоподобные, но ложные объяснения, рационализируя выводы. Это указывает на риски ...

Added: February 15, 2026

Как прогнозировать дефолты банков: эволюция методов, моделей и факторов риска

Shchepeleva M., Столбов М. И., Экономика и математические методы 2026 Т. 62 № 1 С. 63–77

Predicting bank defaults is an important task for the entire economy. Early identification of troubled banks helps to prevent impending bank failures or minimize the losses associated with them. The paper discusses the state of the art of instrumental methods and data used for this purpose. The theoretical background, the evolution of methodological approaches used ...

Added: February 13, 2026

30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

A Clustering Model for Stocks that Considers Hidden Dynamics and Price Trajectory

Morychev G., Sizykh D., Sizykh N., IEEE Access 2025 Vol. 13 P. 213194–213210

One of the main tools for analyzing large volumes of financial data is the use of clustering methods and models, which allow the identification of various patterns. This study examines the problem of clustering time series that reflect the behavior of prices, yields, modes, trends, and a number of related stock indicators. The relevance and ...

Added: February 3, 2026

Efficient Incorporation of New Interactions in Graph Recommenders via Folding-In

Yusupov V., Sukhorukov N., Frolov E., , in: User Modeling and User-Adapted Interaction.: Springer, 2026. Ch. 36.2 P. 1–24.

Added: January 29, 2026