HSE at TempoWiC: Detecting Meaning Shift in Social Media with Diachronic Language Models

Elizaveta Tukhtina; Svetlana Vydrina; K. Kashleva

?

HSE at TempoWiC: Detecting Meaning Shift in Social Media with Diachronic Language Models

P. 35–38.

Elizaveta Tukhtina, Svetlana Vydrina, Kashleva K.

This paper describes our methods for temporal meaning shift detection, implemented during the TempoWiC shared task. We present two systems: with and without time span data usage. Our approach is based on masked language
models continuously pre-trained with Twitter data. Both systems outperformed all the competition’s
baselines except TimeLMs-SIM. Our best submission achieved the macro-F1 score of 70.09% and took the 7th place. This result was achieved by using diachronic language models from the TimeLMs project.

Language: English

Full text

Keywords: semantic shift NLP

In book

Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)

Абу-Даби: Association for Computational Linguistics, 2022.

Granular computing-based deep learning for text classification

Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746

Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...

Added: March 12, 2026

30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

INCOMA Ltd, 2021.

Added: January 28, 2026

Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Association for Computational Linguistics, 2025.

Added: November 17, 2025

Качественная семантика лексемы «православный» в современном русском языке

Комышкова А. Д., Русистика 2025 Т. 23 № 3 С. 416–432

The relevance of the research is conditioned by the interest of linguistics in studying the structure of polysemous words and the mechanisms of polysemy development and lexicographic fixing the meanings of words. The qualitative meaning of the adjective право- славный (orthodox) is also interesting for studying the linguistic conceptualization of cultural representations of native speakers. The aim of ...

Added: November 7, 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...

Added: November 6, 2025

Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения

Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181

Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...

Added: October 9, 2025

Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Tartu: University of Tartu Library, 2025.

The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...

Added: July 17, 2025

HSE NLP Team at MEDIQA-CORR 2024 Task: In-Prompt Ensemble with Entities and Knowledge Graph for Medical Error Correction

Tutubalina E., Valiev A., Association for Computational Linguistics 2024 P. 470–482

This paper presents our LLM-based system designed for the MEDIQA-CORR @ NAACL-ClinicalNLP 2024 Shared Task 3, focusing on medical error detection and correction in medical records. Our approach consists of three key components: entity extraction, prompt engineering, and ensemble. First, we automatically extract biomedical entities such as therapies, diagnoses, and biological species. Next, we explore ...

Added: December 13, 2024

Data-driven approach to curriculum analysis

Iu. Nasu, M.S. Drobinin, M.S. Efanov et al., Proceedings of the Institute for System Programming of the RAS 2024 Vol. 36 No. 2 P. 83–90

The choice of an educational program is momentous in young people's lives. Given the shortage of time after exams, applicants usually do not have time to analyze possible educational tracks. Furthermore, it requires a thorough study of learning plans. This research addresses the problem proposing the algorithm to data-driven curriculum analysis based on natural language ...

Added: December 11, 2024

Bridging Gaps in Russian Language Processing: AI and Everyday Conversations

Tatiana Sherstinova, Nikolay Mikhaylovskiy, Evgenia Kolpashchikova et al., , in: Proceedings of the 35th Conference of Open Innovations Association FRUCT, 24-26 April 2024, Tampere, FinlandIssue 1.: FRUCT Oy, 2024. P. 253–258.

Contemporary advancements in NLP and neural network techniques are paving the way to enhance and harness traditional linguistic resources and corpora, as well as expand the methods of applying neural networks for complex language material. Thus, a weak point for both theoretical and applied linguistic tasks is the processing of spontaneous everyday speech. Two experiments ...

Added: November 29, 2024

Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics (Field Matters 2024)

Bangkok: Association for Computational Linguistics, 2024.

Added: November 13, 2024

A Language Model for Grammatical Error Correction in L2 Russian

Remnev N., Obiedkov S., Rakhilina E. V. et al., / Series Computer Science "arxiv.org". 2023.

Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, ...

Added: October 30, 2024

Language model interpretation as an exploration tool: on the way to understand better

Поздняков Д. В., / Series " ". 2025.

Model interpretation is very important when it comes to deteting hidden biases, ensuring model safety and trustworthiness. More and more interpretation methods are emerging. Focusing on the case of black-box transformer-based NLP model, for each considered interpretation application we provide an overview of existing tools and methods. We conclude that two trends will be central ...

Added: September 30, 2024

Papilusion at DAGPap24: Paper or Illusion? Detecting AI-generated Scientific Papers

Andreev N., Shirnin A., Mikhailov V. et al., , in: Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024).: Association for Computational Linguistics, 2024. P. 215–219.

Added: September 24, 2024

Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

Association for Computational Linguistics, 2024.

Welcome to the Fourth Workshop on Scholarly Document Processing (SDP) at ACL 2024. As the body of scholarly literature grows, automated methods in NLP, text mining, information retrieval, document understanding etc. are needed to address issues of information overload, disinformation, reproducibility, and more. Though progress has been made, there are significant unique challenges to processing ...

Added: September 24, 2024

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

TAPE: Assessing Few-shot Russian Language Understanding

Taktasheva E., Shavrina T., Fenogenova A. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2022.: Association for Computational Linguistics, 2022. P. 2472–2497.

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six ...

Added: September 22, 2023

Proceedings of the 29th International Conference on Computational Linguistics

International Committee on Computational Linguistics, 2023.

Added: August 14, 2023