The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics

?

The Effect of Unobserved Word-Context Co-occurrences on a Vector-Mixture Approach for Compositional Distributional Semantics

PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '18). 2018. P. 153–161.

Bakarov A.

Swivel (Submatrix-WIse Vector Embedding Learner) is a distributional semantic model based on counting point-wise mutual information values, capable of capturing word-context co-occurrences in the PMI matrix that were not noted in the training corpus. This model outperforms mainstream word embedding training algorithms such as Continuous Bag-of-Words, GloVe and Skip-Gram in word similarity and word analogy tasks. But the properness of these intrinsic tasks could be questioned, and it is unclear if the ability to count unobservable word-context co-occurrences could also be helpful for downstream tasks. In this work we propose a comparison of Word2Vec and Swivel for two downstream tasks based on natural language sentence matching: the paraphrase detection task and the textual entailment task. As a result, we reveal that Swivel outperforms Word2Vec in both cases, but the difference is minuscule. We can conclude, that the ability to learn embeddings for rarely co-occurring words is not so crucial for downstream tasks.

Research target: Philology and Linguistics Computer Science

Priority areas: IT and mathematics

Language: English

Full text

Text on another site

Местоимения с фокусным антецедентом в русском языке: кореферентные и связанные употребления в корпусах

Tiskin D., Компьютерная лингвистика и интеллектуальные технологии 2026 No. 24 P. 656–665

Despite a lot of interest for the factors influencing the choice of pronoun (reflexive or personal) with an antecedent in Russian, the role of the anaphotic relation—coreference or semantic binding—has been understudied, including disagreements as to the acceptability of particular data points. To clarify things, I employ large corpora (Araneum and GICR) to study the ...

Added: July 19, 2026

Не только ἐπιχώρια διδάγματα: пайдейя Эпаминонда

Mozhaysky A., Schole. Философское антиковедение и классическая традиция 2026 Т. 20 № 2 С. 1105–1116

This article examines the education of Epaminondas, the most famous Theban military and political figure. However, in antiquity, Epaminondas was also renowned for his education and philosophical authority. The study demonstrates that Epaminondas' education encompassed a complex set of local teachings, which Pausanias describes as ἐπιχώρια διδάγματα. However, Epaminondas' education differed from that of most members ...

Added: July 17, 2026

Английский язык для студентов педагогических вузов. = English for Pre-Service Teachers (B2-C1)

Stognieva O., Новикова В. П., М.: Флинта, 2026.

Инновационный курс английского языка для специальных целей для студентов педагогических вузов предлагает погружение в актуальный образовательный дискурс: от вопросов воспитания и когнитивного развития детей и подростков до переосмысления роли школы в цифровую эпоху. Содержательной основой курса выступают аутентичные мультимодальные материалы, позволяющие анализировать глобальные тренды современных образовательных систем и подходов. Издание идеально подходит вузам, стремящимся подготовить ...

Added: July 16, 2026

Вклад Нгуен Тонг Куая в развитие вьетнамской поэзии (Новый взгляд на творчество поэта XVIII века)

Britov I., Вьетнамские исследования 2026 Т. 10 № 2 С. 87–98

The article analyzes the work of the poet of the XVIII century. Nguyen Tong Quai. Attention is drawn to the fact that in Vietnam, only after the proclamation of the policy of renewal, they began to actively study and appreciate his literary legacy, although even during the poet's lifetime, his contemporaries gave extremely positive reviews ...

Added: July 16, 2026

WSI-GT: Pseudo-Label Guided Graph Transformer for Whole-Slide Histology

Михайлов И. А., Machine Learning and Knowledge Extraction 2026 Vol. 8 No. 1 Article 8

Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely on deep graph models prone to oversmoothing and loss of local tissue ...

Added: July 16, 2026

On the construction of Barnes–Wall lattices and their application in cryptography

Kuninets A., Malygina E., Leevik A. G. et al., Journal of Computer Virology and Hacking Techniques 2026 No. 22 Article 62

In this work, we investigate the application of Barnes–Wall lattices in post-quantum cryptographic schemes. We survey and analyze several constructions of Barnes–Wall lattices, including subgroup chains, the generalized k-ing construction, and connections with Reed-Muller codes, highlighting their equivalence over both Z[i] and Z. Building on these structural insights, we introduce a new algorithm for efficient ...

Added: July 16, 2026

Tencent и Open Source. Как относится к открытому ПО самый дорогой бренд Китая?

Silakov D., Системный администратор 2026 № 5 С. 46–51

В предыдущей статье про Open Source в КНР [1] мы рассказали про Alibaba – крупную корпорацию, занимающую тридцатое место в рейтинге самых значимых мировых брэндов за 2025 год [2]. Место почетное, но не первое среди китайских компаний – на тринадцатом месте расположилась Tencent, разработчик WeChat и ряда других продуктов, широко используемых нашими восточными соседями. Tencent ...

Added: July 14, 2026

Комитативно-аддитивная полисемия в пуровском диалекте лесного ненецкого языка

Kozlov A., Лапшина К. М., Вопросы языкознания 2026 № 4 С. 132–146

This article examines two functions of the suffix -samae in the Pur dialect of Forest Nenets based on fieldwork data: comitative (expression of jointness: ‘with X’) and scalar additive (focus particle with the meaning ‘even X’). The comitative use of the suffix -samae primarily marks an inanimate companion. However, its use is also possible with other types ...

Added: July 13, 2026

2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

IEEE, 2026.

Added: July 13, 2026

Prompt Design for GPT-4 Assessments of EFL Student Reports

Stognieva O., Murashova N., Journal of Asia TEFL 2026 Vol. 23 No. 2 P. 490–505

This study investigates the impact of different prompt design strategies on the performance of GPT-4 in assessing undergraduate reports within an English as a Foreign Language (EFL) context. As Large Language Models (LLMs) increasingly integrate into educational assessment, understanding how prompt engineering affects grading accuracy and alignment with human judgment is crucial. Three prompt design methods—TELeR Taxonomy, Six strategies ...

Added: July 12, 2026

Mathematical Optimization Theory and Operations Research, 25th International Conference, MOTOR 2026 Irkutsk, Russia, July 6–11, 2026 Proceedings

Switzerland: Springer, 2026.

This volume contains the refereed proceedings of the 25th International Conference on Mathematical Optimization Theory and Operations Research (MOTOR 2026) 1 held during July 6–11 in a picturesque place near Lake Baikal, Irkutsk, Russia. The MOTOR conference is a direct successor and scientific inheritor of several prominent events on mathematical programming, combinatorial and stochastic optimization, ...

Added: July 12, 2026

Задачи бесконечной регулярной реализуемости

Шиманогов И. Н., Vyalyi M., Дискретный анализ и исследование операций 2025 Т. 32 № 4(166) С. 213–230

A well-studied class of algorithmic problems is that of regular realizability: checking the non-emptiness of the intersection of a regular language with a given language. This problem has a natural algebraic interpretation: verifying whether an element of a Boolean algebra belongs to the kernel of a certain homomorphism. This motivates the consideration of an analogous ...

Added: July 12, 2026

International Academic Conference. Proceedings of the Scientific Forum “Modern Science: Theory and Practice” (April 22, 2026). Belgrade, Serbia. Part 3.

Scientific publishing house Infinity, 2026.

Scientific Forum Proceedings combine materials of the conference – research papers and thesis reports of scientific workers. They examine technical, juridical and sociological aspects of research issues. Some articles deal with theoretical and methodological approaches and principles of research questions of personality professionalization. ...

Added: July 10, 2026

Этот смутный объект внимания: "реальные предметы" и гаптический опыт в рассказах В. Вулф

Shulyatieva D., Новое литературное обозрение 2026 № 199 С. 128–140

В статье рассмотрена гаптическая образность в поэтике В. Вулф на примере трех ее рассказов («Пятно на стене», «Женщина в зеркале», «Реальные предметы»), в центре которых оказываются предметы, устанавливающие обновленные отношения с героями. С опорой на теорию гаптической визуальности и на теорию вещи описаны трансформации, которые происходят с предметами, и переживание, которое открывается герою и нарратору при соприкосновении с ними, ...

Added: July 10, 2026

Two ga-morphemes in Rutul: Accidental similarity or a case of polygrammaticalization?

Maisak T., Word Structure 2026 Vol. 19 No. 2-3 P. 338–367

In a situation when two or more grammaticalization targets in one language are phonologically identical but functionally distinct, neither polygrammaticalization nor accidental syncretism can be ruled out, especially if we are dealing with a language without historical attestations. In the present paper, I present a detailed account of the coexistence of two homophonous grammatical markers ...

Added: July 9, 2026

Towards a typology of imperative interjections: ‘Take it!’ in the Caucasus

Maisak T., Transactions of the Philological Society 2026 Vol. 124 No. 2 P. 386–427

This paper presents a first typological study of a particular type of imperative interjections, namely interjections with the meaning ‘here, take it!’ used by a speaker when they ask the addressee to take something from the speaker's hands (often combined with a gesture of giving). The sample of languages is both geographically and genealogically restricted ...

Added: July 9, 2026

Light Verb Constructions from a Cross-Linguistic Perspective

Berlin, Boston: De Gruyter, 2025.

Light verb constructions are complex predicates consisting of a semantically reduced verb and an additional often phrasal element contributing the main predicational content. Although light verb constructions have been identified for various (genetically unrelated) languages, a comparative concept which allows identifying light verb constructions across languages is still missing. The present volume approaches this issue ...

Added: July 9, 2026

Improving Differential Equation Solving in Compact Language Models via Activation Steering and Reinforcement Learning

Surkov A., Ignatenko V., Koltcov Sergei, Computers, Materials and Continua 2026

Large language models have recently demonstrated promising capabilities in mathematical reasoning; however, their performance on tasks requiring strict symbolic manipulation, such as solving differential equations, remains limited, especially for compact models. In this work, we investigate whether activation steering combined with reinforcement learning can improve the quality of solutions generated by pretrained language models without ...

Added: July 8, 2026

Computational Science and Its Applications – ICCSA 2026 Workshops

Springer, 2027.

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education. LNCS enjoys close cooperation with the computer science R & ...

Added: July 8, 2026

The Semiotic Intensity Approach: A Scoping Review of Amplification and Attenuation Mechanisms in Multimodal Media Discourse

Yin Z., Terra Linguistica 2026 Vol. 17 No. 2 P. 152–168

Abstract. In the context of global communication, the construction of national images in the media has evolved from passive reporting to active meaning modulation. Using China as a case study, this research introduces the Semiotic Intensity Approach (SIA) to quantify how news media integrate verbal, visual, and layout resources to either amplify or attenuate specific ...

Added: July 8, 2026

Conference Proceedings: 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 14-15 May 2026

IEEE, 2026.

The purpose of the 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) is to bring together researchers and practitioners from multiple areas of radio science, including biomedical engineering, radioelectronics, microelectronics, information technology, smart energy, information security and others. ...

Added: July 8, 2026

Моделирование специализированных алгоритмов маршрутизации в сетях на кристалле, представленных сериями семейств циркулянтных топологий

Маликов М. А., Монахова Э. А., Rzaev E. et al., Ученые записки Казанского университета. Серия: Физико-математические науки 2026 Т. 168 № 2 С. 269–286

This article examines series of families of two-dimensional circulant networks with rectangular L -shapes, optimal in diameter, as network-on-chip topologies with a minimal number of crossings between the links and a bounded length of the maximum link that does not depend on the network size. New network-on-chip routing algorithms, which use the coordinates of three adjacent zeros in the ...

Added: July 8, 2026

Algorithmic overlaps as thermodynamic variables: From local to cluster Monte Carlo dynamics in critical phenomena

Pilé I., Deng Y., Shchur L., Physical Review B: Condensed Matter and Materials Physics 2026 Vol. 114 No. 1 Article 014101

We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Swendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for models in different universality classes: Ising model, Potts model with three components, and four-state Potts model. The overlap of two ...

Added: July 6, 2026

Growth in noncommutative algebras and entropy in derived categories

Piontkovski D., / Series arXiv "math". 2026.

A noncommutative projective variety is defined, following Artin and Zhang, by a graded coherent algebra 𝐴. The category of coherent sheaves is then the quotient qgr(𝐴) of the category of finitely presented graded modules by the subcategory of torsion modules. We consider the categorical and polynomial entropies of the Serre twist, that is, of the ...

Added: June 23, 2026