LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

?

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

Ch. 20503. P. 1–26.

Vladimir Bogachev, Aletov V., Alexander Molozhavenko, Bobkov D., Soboleva V., Alanov A., Rakhuba M.

This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank matrix manifold that generalizes the recently proposed Muon optimizer; (2) we develop a Riemannian gradient-informed LoRA initialization, and (3) we provide an efficient implementation without prominent overhead that uses automatic differentiation to compute arising geometric operations while adhering to best practices in numerical linear algebra. Comprehensive experimental results on both LLM and diffusion model architectures demonstrate that our approach yields consistent and noticeable improvements in convergence speed and final task performance over both standard LoRA and its state-of-the-art modifications.

Language: English

Full text

Text on another site

In book

The Fourteenth International Conference on Learning Representations (ICLR 2026)

ICLR, 2026.

Benchmarking DNA large language models on quadruplexes

Cherednichenko O., Herbert A., Poptsova M., Computational and Structural Biotechnology Journal 2025 Vol. 27 P. 992–1000

Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and state-space models ...

Added: June 19, 2026

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Severin N., Kartushov D., Urzhumov V. et al., , in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II. (LNCS, volume 16484).: Cham: Springer Publishing Company, 2026. P. 508–517.

Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in cap-turing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches cre-ate prohibitive inference costs in real time. To address these limitations, we present a ...

Added: June 18, 2026

ESQA: Event Sequences Question Answering

Abdullaeva I., Karpukhin I., Filatov A. et al., IEEE Access 2026 Vol. 14 P. 59390–59408

Event sequences, a specialized type of tabular data annotated with timestamps, are prevalent across practical domains such as finance, retail, social networks, and healthcare. Despite the importance of event sequence modeling and analysis, there has been little effort to adapt Large Language Models (LLMs) to this domain. In this paper, we propose a novel solution ...

Added: June 16, 2026

Bridging the Semantic Gap in Metadata Management using Large Language Models

Сулейкин А. С., Сорокина В., Пятецкий В. Е., , in: 2025 7th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency.: [б.и.], 2025. P. 748–753.

Effective metadata management is fundamental to data governance, ensuring that data assets are discoverable, understandable, and usable across the enterprise. However, traditional metadata systems often remain purely technical, describing structures without conveying business meaning. This disconnect — known as the semantic gap — limits the interpretability and value of metadata for business users. To address ...

Added: April 17, 2026

Разработка и интеграция AI-ассистента в систему управления обучением.

Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190

The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...

Added: December 25, 2025

Optimization on the Extended Tensor-Train Manifold with Shared Factors

Alexander Molozhavenko, Rakhuba M., Computational and Applied Mathematics 2026 Vol. 45 No. 6 Article 221

This paper studies tensors that admit decomposition in the Extended Tensor Train (ETT) format, with a key focus on the case where some decomposition factors are constrained to be equal. This factor sharing introduces additional challenges, as it breaks the multilinear structure of the decomposition. Nevertheless, we show that Riemannian optimization methods can naturally handle ...

Added: December 22, 2025

Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs

David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93

Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...

Added: December 22, 2025

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Sviridov I., Miftakhova A., Tereshchenko A. et al., , in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP).: Association for Computational Linguistics, 2025. Ch. 1353 P. 26625–26665.

Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality ...

Added: November 16, 2025

Comparative Study of LoRA and Full Fine-Tuning in Large Language Models

E.V. Surikova, E.A. Sabidaeva, , in: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. P. 90–98.

Added: July 3, 2025

Подход к созданию сервиса генерации программного кода мобильных приложений с использованием больших языковых моделей

Резуник Л., Александров Д.В., ИТ-Стандарт 2024 № 4 С. 34–41

Machine learning technologies and various tools for code generation have had a significant impact on the field of software development in recent years. Although most of the existing solutions are not built exactly for code generation, programmers apply them in different tasks. Not many of the existing AI solutions work well with less common languages, ...

Added: December 30, 2024

Wrong Answers Only: Distractor Generation for Russian Reading Comprehension Questions Using a Translated Dataset

Background: Reading comprehension questions play an important role in language learning. Multiple-choice questions are a convenient form of reading comprehension assessment as they can be easily graded automatically. The availability of large reading comprehension datasets makes it possible to also automatically produce these items, reducing the cost of development of test question banks, by fine-tuning ...

Added: December 24, 2024

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

Sergei Koltcov, Surkov A., Koltsova O. et al., PeerJ Computer Science, США 2024 Vol. 10 Article e2395

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems ...

Added: December 2, 2024

Training a Tucker Model With Shared Factors: a Riemannian Optimization Approach

Peshekhonov I., Aleksey Arzhantsev, Rakhuba M., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. Ch. 238 P. 3304–3312.

Added: November 29, 2024

Group and Shuffle: Efficient Structured Orthogonal Parametrization

Gorbunov M., Yudin N., Soboleva V. et al., , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024).: [б.и.], 2024. P. 68713–68739.

Added: November 26, 2024

EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas

Mozikov M., Severin N., Bodishtianu V. et al., , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024).: [б.и.], 2024. P. 13927–13981.

Added: November 22, 2024

Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option

Yakovlev K., Nikolenko S., Bout A., , in: Findings of the Association for Computational Linguistics: EMNLP 2024.: Association for Computational Linguistics, 2024. P. 5967–5974.

The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top-k tools selected by ToolkenGPT and the second ...

Added: November 22, 2024

AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

Shirnin A., Andreev N., Mikhailov V. et al., , in: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024).: Mexico: Association for Computational Linguistics, 2024. P. 1667–1672.

This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection). We propose a two-stage pipeline combining predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers. AIpom is ranked second on the leaderboard while achieving a Mean Absolute Error of ...

Added: July 19, 2024

Сайнс-арт и китч: компьютерное искусство на основе больших языковых моделей

Milovidov S., Коммуникации. Медиа. Дизайн 2024 Т. 9 № 2 С. 45–64

Today the emergence of large language models has led to the spread of popular graphic neural network generators (DALL-E, MidJourney, Stable Diffusion, Kandinsky, etc.). There was the reason of the widespread implementation and democratisation of artistic practices. The article analyses the processes of disappearance of the boundaries between art and kitsch in relation to computer ...

Added: July 1, 2024

Multi-user facial emotion recognition in video based on user-dependent neural network adaptation

Churaev E., Andrey V. Savchenko, , in: 2022 VIII International Conference on Information Technology and Nanotechnology (ITNT).: IEEE, 2022. P. 1–5.

In this paper, the multi-user video-based facial emotion recognition is examined in the presence of a small data set with the emotions of end users. By using the idea of speaker-dependent speech recognition, we propose a novel approach to solve this task if labeled video data from end users is available. During the training stage, ...

Added: September 25, 2022

Exploration in Sequential Recommender Systems via Graph Representations

Kiselev D., Makarov I., IEEE Access 2022 Vol. 10 P. 123614–123621

Temporal graph networks are powerful tools for solving the cold-start problem in sequential recommender systems. However, graph models are susceptible to feedback loops and data distribution shifts. The paper proposes a simple yet efficient graph-based exploration method for the mitigation of the issues above. It adopts the counter-based state exploration from reinforcement learning to the ...

Added: September 5, 2022