Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

Alina Shutova; Vladimir Malinovskii; Vage Egiazarian; Denis Kuznedelev; Ivan Ermakov

Publications

?

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

Alina Shutova, Vladimir Malinovskii, Vage Egiazarian, Denis Kuznedelev, Ivan Ermakov

Language: English

Text on another site

Keywords: large language models

In book

Volume 267: International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, Canada

Vol. 267. , [б.и.], 2025.

Персонализированная обратная связь на основе искусственного интеллекта: модель для магистратуры гуманитарного профиля

Подболотова М. И., Адамский А. И., Kolachev N. et al., Высшее образование в России 2026 Т. 35 № 4 С. 21–35

The purpose of the article is to present and justify a pedagogical model of personal ized feedback based on large language models (LLM) for the educational process in a human ities-oriented master’s program. The relevance of the study is determined by the objectives of digital transformation of higher education in the Russian Federation, outlined in Presidential Decree No. 474 ...

Added: May 4, 2026

Применение больших языковых моделей для анализа ценностно-патриотического дискурса русскоязычных пользователей

Balakina Y. V., Grigoreva M., Соколова Е. Н., Вестник Российского фонда фундаментальных исследований. Гуманитарные и общественные науки 2025 Т. 123 № 4 С. 56–69

The article examines the potential of large language models (LLMs) for automated analysis of value-laden and patriotic discourse in Russian-language social media. Using a corpus of posts from VK, Odnoklassniki and Telegram (2023–2025), it investigates the extent to which automatic coding results align with expert annotation based on a specially developed categorical scheme. The codebook ...

Added: November 26, 2025

Новые интерфейсы и новые медиаторы

Maksimenkova O. V., Сегал А. П., Вопросы философии 2025 № 10 С. 67–76

The study is devoted to the humans and artificial intelligence (AI) interaction. The authors view this interaction as mediated by interfaces that both simplify it and hide the real mechanisms of encoding and decoding messages (according to Shannon). In such a situation, the characteristics of the actor of communication are blurred, and it is not ...

Added: October 2, 2025

Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V

Cham: Springer, 2025.

This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025. The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...

Added: September 29, 2025

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Распознавание рукописного текста и интеллектуальный анализ: возможности нейронных технологий (на примере работы с «Дневником» Ф.П. Литке)

Boltunova E., Laptev A., Имагология и компаративистика 2025 № 23 С. 358–379

Added: June 16, 2025

Оценивание студенческих работ в рамках обучения академическому письму на английском языке в контексте развития инструментов искусственного интеллекта

Bakulev A., В кн.: Профессионализм учителя иностранных языков и его реализация. Сборник статей по материалам научно-методического симпозиума с международным участием «Лемпертовские чтения – XXVII» 15-17 мая 2025 года.: Пятигорск: Издательство Пятигорского государственного университета, 2025. С. 270–279.

The paper focuses on assessing students’ written papers in the discipline “Academic Writing in English in the context of AI tools’ capabilities. AI tools, specifically large language models (LLMs) appear to be able to tackle and solve a wide range of educational and research tasks. Foreign language teaching is no exception: AI tools are utilized ...

Added: June 5, 2025

Generative AI-based Approach to Concept Drift Generation in Streaming Text Data

Belov B., Peter Panfilov, WSEAS Transactions on Information Science and Applications 2025 Vol. 22 P. 11–20

Real-time analysis of text streams is crucial for industrial and business processes and scenarios. It is expected to be one of the important future research topics in the text processing and understanding domain. Analysis of text data is based on the use of pre-trained machine learning/data mining (ML/DM) models that may demonstrate performance degradation over ...

Added: April 5, 2025

Роль больших языковых моделей в интегрированных средах разработки нового поколения

Ишанхонов А. Ю., Pshichenko D., Можаровский Е. А. et al., Программные системы и вычислительные методы 2024 № 4 С. 140–150

The role of Large Language Models (LLM) in new generation integrated development environments (IDEs). Tools such as GitHub Copilot, IntelliCode and Alice Code Assistant are explored in the context of their use in programming. The authors examine how LLMs enable the automation of key development tasks, including code autocompletion, error detection, refactoring, and code generation, ...

Added: March 10, 2025

Ensuring trustworthy code: leveraging a static analyzer to identify and mitigate defects in generated code

D. Shaikhelislamov, Drobyshevskiy M., A. Belevantsev, Journal of Mathematical Sciences 2024 Vol. 540 P. 233–251

The rise of large language models (LLMs) has greatly advanced code generation capabilities. A recent StackOverflow survey found that 70% of developers are using or planning to use AI coding tools this year. However, most current methods focus on supervised fine-tuning objectives derived from text generation, often overlooking the distinct sequence-level properties of code, such as compilability, and ...

Added: February 3, 2025

Управление знаниями организации и большие языковые модели

Zelenkov Y., Российский журнал менеджмента 2024 Т. 22 № 3 С. 573–601

Purpose: to summarize, classify and analyze current scientific publications on the use of large language models (LLM) in knowledge management in organization. Methodology: systematic literature review was conducted. It was based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. 75 publications were selected for analysis, including academic articles and reports of consulting companies published ...

Added: January 4, 2025

ChatGPT, текст, информация: критический анализ

Komashko M. N., Труды по интеллектуальной собственности 2024 Т. 50 № 3 С. 118–128

The paper deals with theory and practice issues related to such type of artificial intelligence as large language models, in particular, ChatGPT. The main attention is paid to spheres of human activity, in which the exchange of information stated in the form of text is of the greatest importance: science, education and journalism (media sphere). The ...

Added: December 29, 2024

DAREL: Data Reduction with Losses for Training Acceleration of Real and Hypercomplex Neural Networks

Demidovskij A., Трутнев А. И., Тугарев А. М. et al., / Series ZmuLcqwzkl "NeurIPS 2023 Workshop". 2023.

Neural network training requires a lot of resources, and there are situations where training time and memory usage are limited. It makes specialized algorithms for training neural networks within the constraints of resource limitations an important and significant challenge. Data Reduction with Losses is a novel training data reduction method that operates with training samples ...

Added: January 17, 2024