COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation

U. Parkina; M. Rakhuba

?

COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation

P. 71014–71041.

Recent studies suggest that context-aware low-rank approximation is a useful tool for compression and fine-tuning of modern large-scale neural networks. In this type of approximation, a norm is weighted by a matrix of input activations, significantly improving metrics over the unweighted case. Nevertheless, existing methods for neural networks suffer from numerical instabilities due to their reliance on classical formulas involving explicit Gram matrix computation and their subsequent inversion. We demonstrate that this can degrade the approximation quality or cause numerically singular matrices.

To address these limitations, we propose a novel inversion-free regularized framework that is based entirely on stable decompositions and overcomes the numerical pitfalls of prior art. Our method can handle all possible challenging scenarios: (1) when calibration matrices exceed GPU memory capacity, (2) when input activation matrices are nearly singular, and even (3) when insufficient data prevents unique approximation. For the latter, we prove that our solution converges to a desired approximation and derive explicit error bounds.

Language: English

Full text

Text on another site

Keywords: Large Language Models Low Rank Approximation

In book

39th Conference on Neural Information Processing Systems (NeurIPS 2025)

NeurIPS, 2025.

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Zmushko P., Beznosikov A., Takáč M. et al., , in: Volume 267: International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, CanadaVol. 267.: [б.и.], 2025. P. 80708–80739.

With the increase in the number of parameters in large language models, the training process increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA), low-rank gradient projection (GaLore), and blockwise optimization (BAdam) have ...

Added: November 10, 2025

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Rodionov G., Roman Garipov, Alina Shutova et al., , in: 39th Conference on Neural Information Processing Systems (NeurIPS 2025).: NeurIPS, 2025. P. 46592–46633.

Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently, etc. Recent research ...

Added: November 6, 2025

Гендерные различия в игре диктатора: сравнение поведения больших языковых моделей и людей

Parshakov P., Paklina S., Matkin N. et al., Вестник Пермского университета. Серия: Экономика 2026 Т. 21 № 1 С. 42–57

Introduction. Large language Models (LLM) are increasingly being used in social sciences to simulate the behavior of experimental participants and analyze norms of cooperation and justice. However, the question remains whether they are capable of reproducing social asymmetries, including gender differences. Goal. The work aims to test whether LLM reproduces gender differences in the Dictator ...

Added: October 27, 2025

Large Language Model Failures in Higher Education: Causes and Prevention

Andrei A. Ternikov, COMPUTER 2025 Vol. 58 No. 11 P. 74–83

The rapid adoption of artificial intelligence and large language models (LLMs) in higher education presents unique technical challenges. This article examines critical failures in LLM implementation across academic environments and provides practical strategies for successful integration. ...

Added: July 31, 2025

Smart Technical Support System Development Using Knowledge Map-Aided Approach

Alexander Suleykin, Peter Panfilov, , in: 2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA).: NY: IEEE, 2024. P. 455–460.

Added: April 5, 2025

Proceedings of the 28th Conference on Computational Natural Language Learning

Association for Computational Linguistics, 2024.

CoNLL is a conference organized yearly by SIGNLL (ACL’s Special Interest Group on Natural Language Learning), focusing on theoretically, cognitively and scientifically motivated approaches to computational linguistics. This year, CoNLL was held alongside EMNLP 2024. ...

Added: March 11, 2025

Подход к созданию сервиса генерации программного кода мобильных приложений с использованием больших языковых моделей

Резуник Л., Александров Д.В., ИТ-Стандарт 2024 № 4 С. 34–41

Machine learning technologies and various tools for code generation have had a significant impact on the field of software development in recent years. Although most of the existing solutions are not built exactly for code generation, programmers apply them in different tasks. Not many of the existing AI solutions work well with less common languages, ...

Added: December 30, 2024

Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

Voronov A., Wolf L., Ryabinin M., , in: Findings of the Association for Computational Linguistics: ACL 2024.: Association for Computational Linguistics, 2024. P. 6287–6310.

Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning. In this work, we conduct a comprehensive study of the template format’s influence ...

Added: December 24, 2024

LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering

Северин Н. Н., Булычев И. Д., Yushkov M. et al., ICDM 2024

We present LLM-KT, a flexible framework designed to enhance collaborative filtering (CF) models by seamlessly integrating LLM (Large Language Model)-generated features. Unlike existing methods that rely on passing LLM-generated features as direct inputs, our framework injects these features into an intermediate layer of any CF model, allowing the model to reconstruct and leverage the embeddings ...

Added: December 13, 2024

ПРИМЕНЕНИЕ СТИЛОМЕТРИИ ДЛЯ ОПРЕДЕЛЕНИЯ СГЕНЕРИРОВАННЫХ ТЕКСТОВ

Е. А. Сальников, А. А. Бонч-Осмоловская, В кн.: Информационные технологии в гуманитарных исследованиях: Материалы Международной научно-практической конференции, Красноярск, 25–28 сентября 2023 г.: Сибирский федеральный университет, 2023. С. 176–182.

В рамках данного доклад будет проанализировано использование стилометрической метрики дельта Бёрроуза в качестве метода для определения искусственного (т. е. сгенерированного языковой моделью) текста. Данными для эксперимента послужили дневники – как дневниковые записи случайно выбранных авторов, так и дневниковые записи М. М. Пришвина. В качестве данных языковых моделей послужили дневниковые записи, сгенерированные при помощи языковых моделей ...

Added: October 11, 2024

Linguacodus: A synergistic framework for transformative code generation in machine learning pipelines

Trofimova E., Emil Sataev, Ustyuzhanin A., PeerJ Computer Science 2024 Vol. 10 Article e2328

In the ever-evolving landscape of machine learning, seamless translation of natural language descriptions into executable code remains a formidable challenge. This paper introduces Linguacodus, an innovative framework designed to tackle this challenge by deploying a dynamic pipeline that iteratively transforms natural language task descriptions into code through high-level data-shaping instructions. The core of Linguacodus is ...

Added: September 27, 2024

Dialogue as Autocommunication - On Interactions with Large Language Models

Kartasheva Anna, Technology and Language 2024 Vol. 5 No. 2 P. 57–66

In a dialog with large language models (LLM) there is a coincidence of the addressee and addressee of the message, so such a dialog can be called autocommunication. A neural network can only answer a question that has a formulation. The question is formulated by the one who asks it, i.e. a human being. Human activity in dialog ...

Added: September 9, 2024

ChatGPT vs. Crowdsourcing vs. Experts: Annotating Open-Domain Conversations with Speech Functions

Ostyakova Lidiia, Smilga V., Petukhova K. et al., , in: Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue.: Prague: Association for Computational Linguistics, 2023. P. 242–254.

This paper deals with the task of annotating open-domain conversations with speech functions. We propose a semi-automated method for annotating dialogs following the topic-oriented, multi-layered taxonomy of speech functions with the use of hierarchical guidelines using Large Language Models. These guidelines comprise simple questions about the topic and speaker change, sentence types, pragmatic aspects of ...

Added: May 24, 2024

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Dettmers T., Ruslan Svirschevski, Vage Egiazarian et al., , in: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024).: ICLR, 2024.

Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy ...

Added: March 5, 2024

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al., , in: 41st International Conference on Machine Learning, ICML 2024; Vienna; Austria; 21 July 2024 до 27 July 2024.: Maastricht: ML Research Press, 2024.

The emergence of accurate open large language models (LLMs) has led to a race towards quantization techniques for such models enabling execution on end-user devices. In this paper, we revisit the problem of "extreme" LLM compression--defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter, from the point of view ...

Added: March 5, 2024