• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

P. 7757–7764.
Anton R., Mikhalchuk M., Rahmatullaev T., Goncharova E., Druzhinina P., Oseledets I., Kuznetsov A.

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer’s embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of “filler” tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer contributions (via an adapted Logit Lens), and measures the intrinsic dimensionality of representations. This toolkit illuminates how seemingly trivial tokens can be critical for long-range understanding.

Language: English
DOI
Text on another site
Keywords: NLPинтерпретируемостьinterpretabilityLLMбольшие языковые моделиОбработка естественного языка (NLP)

In book

Findings of the Association for Computational Linguistics: NAACL 2025
Association for Computational Linguistics, 2025.
Similar publications
Optimizing Computational Infrastructure for Large Language Models in Bioinformatics: A Case Study
Beknazarov N., , in: Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)Vol. 2891.: Springer, 2026. P. 3–16.
This paper addresses the challenge of efficiently training Large Language Models (LLMs) on large-scale, sparse omics datasets in high-performance computing (HPC) environments. Using over 1000 BED tracks as a representative data source, we propose a method combining interval-based chunked storage, sparse matrix transformation, and parallel data loading, integrated within a PyTorch Lightning training framework. Our ...
Added: May 19, 2026
От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)
Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194
With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...
Added: May 8, 2026
Персонализированная обратная связь на основе искусственного интеллекта: модель для магистратуры гуманитарного профиля
Подболотова М. И., Адамский А. И., Kolachev N. et al., Высшее образование в России 2026 Т. 35 № 4 С. 21–35
The purpose of the article is to present and justify a pedagogical model of personal ized feedback based on large language models (LLM) for the educational process in a human ities-oriented master’s program. The relevance of the study is determined by the objectives of digital transformation of higher education in the Russian Federation, outlined in Presidential Decree No. 474 ...
Added: May 4, 2026
Об идеологических предвзятостях генеративного ИИ: Российско-украинский конфликт в репрезентации ChatGPT
Baysha O., Trofimov V., Российская школа связей с общественностью 2026 № 40 С. 171–191
A growing number of scholars are warning about the dangers of the reproduction by generative AI of socio-political and ideological biases absorbed by models from the texts on which they were trained. If a given model was trained on Western media texts, it may generate narratives that reproduce West centric views of world events. This ...
Added: April 21, 2026
Large Language Models as Political Actors: Cultural Bias and Epistemic Power
Seredkina E., Seletkova G., Mikhailovsky A., Technology and Language 2026 Vol. 7 No. 1 P. 63–79
The rapid diffusion of Large Language Models (LLMs) into socially and politically sensitive domains raises critical questions about the nature and origins of political bias in artificial intelligence. While existing research often treats bias as a technical flaw to be minimized, this article advances a broader philosophical and cultural interpretation of LLM bias as an ...
Added: April 1, 2026
Granular computing-based deep learning for text classification
Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746
Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...
Added: March 12, 2026
Mechanistic Permutability: Match Features Across Layers
Balagansky N., Maximov I., Gavrilov D., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 57940–57957.
Understanding how features evolve across layers in deep neural networks is a fundamental challenge in mechanistic interpretability, particularly due to polysemanticity and feature superposition. While Sparse Autoencoders (SAEs) have been used to extract interpretable features from individual layers, aligning these features across layers has remained an open problem. In this paper, we introduce SAE Match, ...
Added: February 25, 2026
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Seleznyov M., Chaichuk M., Ershov G. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2025.: Association for Computational Linguistics, 2025. P. 20370–20385.
Large Language Models (LLMs) are highly sensitive to subtle, non-semantic variations in prompt phrasing and formatting. In this work, we present the first systematic evaluation of 4 methods for improving prompt robustness within a unified experimental framework. We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural ...
Added: February 3, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework
Ganeeva V., Khrabrov K., Kadurin A. et al., Journal of Cheminformatics 2025 No. 17 Article 164
The recent integration of natural language processing into chemistry has advanced drug discovery. Molecule representations in language models (LMs) are crucial to enhance chemical understanding. We explored the ability of models to match the same chemical structures despite their different representations. Recognizing the same substance in different representations is an important component of emulating the ...
Added: February 3, 2026
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
INCOMA Ltd, 2021.
Added: January 28, 2026
Многоаспектная оценка методов адаптации токенизатора для больших языковых моделей на русском языке
Андрющенко Г. Д., Godunova M., Иванов В. В. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 320–331
Large language models (LLMs) pretrained on English-centered corpora have biases and perform sub-optimally on other natural languages. Adaptation of LLMs vocabulary provides a resource-efficient way to improve the quality of a pretrained model. Previously proposed adaptation techniques focus on performance (accuracy) and size metrics (fertility), ignoring other aspects in comparison, such as inference latency, compute ...
Added: January 15, 2026
Aspect-Based Sentiment Analysis Using Large Language Models on Museum Visitor Reviews
Anastasia V. Kolmogorova, Elizaveta R. Kulikova, Vladislav V. Lobanov, Supercomputing Frontiers and Innovations 2025 Vol. 12 No. 3 P. 121–140
Museum reviews provide rich insight into visitor preferences and can drive useful change within institutions, yet they have attracted little attention in sentiment research owing to limited commercial interest and the multi-thematic nature of reviews. In this study we analysed over 12 000 reviews in Russian for 15 museum sites collected from nine different platforms. ...
Added: November 30, 2025
Применение больших языковых моделей для анализа ценностно-патриотического дискурса русскоязычных пользователей
Balakina Y. V., Григорьева М. В., Соколова Е. Н., Вестник Российского фонда фундаментальных исследований. Гуманитарные и общественные науки 2025 Т. 123 № 4 С. 56–69
The article examines the potential of large language models (LLMs) for automated analysis of value-laden and patriotic discourse in Russian-language social media. Using a corpus of posts from VK, Odnoklassniki and Telegram (2023–2025), it investigates the extent to which automatic coding results align with expert annotation based on a specially developed categorical scheme. The codebook ...
Added: November 26, 2025
Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction
Morozov L., Mogilevskii A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 2000–2007.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Association for Computational Linguistics, 2025.
Added: November 17, 2025
AutoJudge: Judge Decoding Without Manual Annotation
Roman Garipov, Fedor Velikonivtsev, Ivan Ermakov et al., , in: 39th Conference on Neural Information Processing Systems (NeurIPS 2025).: NeurIPS, 2025. P. 94605–94642.
We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify the generated tokens that affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster.Our approach relies ...
Added: November 6, 2025
Strategizing with AI: Insights from a Beauty Contest Experiment
Iuliia Alekseenko, Dagaev D., Sofiia Paklina et al., Journal of Economic Behavior and Organization 2025 Vol. 240 Article 107330
Added: November 6, 2025
Findings of the Association for Computational Linguistics: NAACL 2025
Association for Computational Linguistics, 2025.
Added: November 6, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit