• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
July 2, 2026
Researchers Discover How Spelling Errors Slow Down Reading in Russian
Psycholinguists from the Centre for Language and Brain at HSE University–St Petersburg have shown that words that are frequently misspelled are processed more slowly by readers, even when presented with the correct spelling. The researchers confirmed this effect for the first time using Russian-language materials and found that response speed is most strongly linked to how confidently individuals can distinguish the correct spelling of a word from an incorrect one. The study has been published in The Mental Lexicon.
July 2, 2026
HSE Develops App for Assessing Phonological Processing in Children
Researchers at the HSE Centre for Language and Brain have developed a new digital tool for assessing children's phonological processing skills—the ZARYA (Sound Analysis of the Russian Language) test battery. It is the first standardised application in Russia designed to provide a fast and reliable assessment of children's ability to distinguish speech sounds, retain them in working memory, and perform phonemic analysis. The app runs on Android tablets and smartphones and is available for download from RuStore. Details of the test validation have been published in the Journal of Speech, Language, and Hearing Research.
July 1, 2026
Scientists Discover Why Europium 'Misbehaves'
Europium is a rare-earth metal responsible for the pure red glow in displays and other luminescent materials. For a long time, however, it refused to emit light when surrounded by certain organic molecules known as acylpyrazolone ligands. Chemists have now uncovered the reason: in europium complexes with these ligands, a 'black window' appears—a charge-transfer state in which the energy absorbed by the ligand is dissipated as heat rather than emitted as light. Understanding this mechanism opens the way to designing more efficient red-emitting materials for displays, fluorescent thermometers, and chemical sensors. The results have been published in Dalton Transactions.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters

Ch. 20503. P. 1–26.
Vladimir Bogachev, Aletov V., Alexander Molozhavenko, Bobkov D., Soboleva V., Alanov A., Rakhuba M.

This work presents a novel, fully Riemannian framework for Low-Rank Adaptation (LoRA) that geometrically treats low-rank adapters by optimizing them directly on the fixed-rank manifold. This formulation eliminates the parametrization ambiguity present in standard Euclidean optimizers. Our framework integrates three key components to achieve this: (1) we derive Riemannion, a new Riemannian optimizer on the fixed-rank matrix manifold that generalizes the recently proposed Muon optimizer; (2) we develop a Riemannian gradient-informed LoRA initialization, and (3) we provide an efficient implementation without prominent overhead that uses automatic differentiation to compute arising geometric operations while adhering to best practices in numerical linear algebra. Comprehensive experimental results on both LLM and diffusion model architectures demonstrate that our approach yields consistent and noticeable improvements in convergence speed and final task performance over both standard LoRA and its state-of-the-art modifications.

Language: English
Full text
Text on another site
Keywords: Smooth manifoldRiemannian optimizationfine-tuninglarge language model (LLM)Diffusion ModelsLow-rank AdaptionFixed matrix rank manifold

In book

The Fourteenth International Conference on Learning Representations (ICLR 2026)
ICLR, 2026.
Similar publications
Benchmarking DNA large language models on quadruplexes
Cherednichenko O., Herbert A., Poptsova M., Computational and Structural Biotechnology Journal 2025 Vol. 27 P. 992–1000
Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and state-space models ...
Added: June 19, 2026
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
Severin N., Kartushov D., Urzhumov V. et al., , in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II. (LNCS, volume 16484).: Cham: Springer Publishing Company, 2026. P. 508–517.
Sequential recommender systems have achieved significant success in modeling temporal user behavior but remain limited in cap-turing rich user semantics beyond interaction patterns. Large Language Models (LLMs) present opportunities to enhance user understanding with their reasoning capabilities, yet existing integration approaches cre-ate prohibitive inference costs in real time. To address these limitations, we present a ...
Added: June 18, 2026
ESQA: Event Sequences Question Answering
Abdullaeva I., Karpukhin I., Filatov A. et al., IEEE Access 2026 Vol. 14 P. 59390–59408
Event sequences, a specialized type of tabular data annotated with timestamps, are prevalent across practical domains such as finance, retail, social networks, and healthcare. Despite the importance of event sequence modeling and analysis, there has been little effort to adapt Large Language Models (LLMs) to this domain. In this paper, we propose a novel solution ...
Added: June 16, 2026
Bridging the Semantic Gap in Metadata Management using Large Language Models
Сулейкин А. С., Сорокина В., Пятецкий В. Е., , in: 2025 7th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency.: [б.и.], 2025. P. 748–753.
Effective metadata management is fundamental to data governance, ensuring that data assets are discoverable, understandable, and usable across the enterprise. However, traditional metadata systems often remain purely technical, describing structures without conveying business meaning. This disconnect — known as the semantic gap — limits the interpretability and value of metadata for business users. To address ...
Added: April 17, 2026
Разработка и интеграция AI-ассистента в систему управления обучением.
Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190
The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...
Added: December 25, 2025
Optimization on the Extended Tensor-Train Manifold with Shared Factors
Alexander Molozhavenko, Rakhuba M., Computational and Applied Mathematics 2026 Vol. 45 No. 6 Article 221
This paper studies tensors that admit decomposition in the Extended Tensor Train (ETT) format, with a key focus on the case where some decomposition factors are constrained to be equal. This factor sharing introduces additional challenges, as it breaks the multilinear structure of the decomposition. Nevertheless, we show that Riemannian optimization methods can naturally handle ...
Added: December 22, 2025
Prediction of protein-protein interactions using point transformer and spherical Convex Hull graphs
David Arteaga, Poptsova M., Computational and Structural Biotechnology Journal 2026 Vol. 31 P. 82–93
Accurate predictions and large-scale identification of protein-protein interactions (PPIs) are crucial for understanding their inherent biological mechanisms and protein functions in virtually all biological processes. Nowadays, graph-based deep learning models have made significant contributions in modeling proteins with physicochemical and geometric features. However, most of these models rely on conventional graph construction methods, such as ...
Added: December 22, 2025
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
Sviridov I., Miftakhova A., Tereshchenko A. et al., , in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP).: Association for Computational Linguistics, 2025. Ch. 1353 P. 26625–26665.
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality ...
Added: November 16, 2025
Comparative Study of LoRA and Full Fine-Tuning in Large Language Models
E.V. Surikova, E.A. Sabidaeva, , in: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. P. 90–98.
Added: July 3, 2025
Подход к созданию сервиса генерации программного кода мобильных приложений с использованием больших языковых моделей
Резуник Л., Александров Д.В., ИТ-Стандарт 2024 № 4 С. 34–41
Machine learning technologies and various tools for code generation have had a significant impact on the field of software development in recent years. Although most of the existing solutions are not built exactly for code generation, programmers apply them in different tasks. Not many of the existing AI solutions work well with less common languages, ...
Added: December 30, 2024
Wrong Answers Only: Distractor Generation for Russian Reading Comprehension Questions Using a Translated Dataset
Login N., Journal of Language and Education 2024 Vol. 10 No. 4 P. 56–70
Background: Reading comprehension questions play an important role in language learning. Multiple-choice questions are a convenient form of reading comprehension assessment as they can be easily graded automatically. The availability of large reading comprehension datasets makes it possible to also automatically produce these items, reducing the cost of development of test question banks, by fine-tuning ...
Added: December 24, 2024
Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
Sergei Koltcov, Surkov A., Koltsova O. et al., PeerJ Computer Science, США 2024 Vol. 10 Article e2395
Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems ...
Added: December 2, 2024
Training a Tucker Model With Shared Factors: a Riemannian Optimization Approach
Peshekhonov I., Aleksey Arzhantsev, Rakhuba M., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. Ch. 238 P. 3304–3312.
Added: November 29, 2024
Group and Shuffle: Efficient Structured Orthogonal Parametrization
Gorbunov M., Yudin N., Soboleva V. et al., , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024).: [б.и.], 2024. P. 68713–68739.
Added: November 26, 2024
EAI: Emotional Decision-Making of LLMs in Strategic Games and Ethical Dilemmas
Mozikov M., Severin N., Bodishtianu V. et al., , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024).: [б.и.], 2024. P. 13927–13981.
Added: November 22, 2024
Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option
Yakovlev K., Nikolenko S., Bout A., , in: Findings of the Association for Computational Linguistics: EMNLP 2024.: Association for Computational Linguistics, 2024. P. 5967–5974.
The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top-k tools selected by ToolkenGPT and the second ...
Added: November 22, 2024
AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4
Shirnin A., Andreev N., Mikhailov V. et al., , in: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024).: Mexico: Association for Computational Linguistics, 2024. P. 1667–1672.
This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection). We propose a two-stage pipeline combining predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers. AIpom is ranked second on the leaderboard while achieving a Mean Absolute Error of ...
Added: July 19, 2024
Сайнс-арт и китч: компьютерное искусство на основе больших языковых моделей
Milovidov S., Коммуникации. Медиа. Дизайн 2024 Т. 9 № 2 С. 45–64
Today the emergence of large language models has led to the spread of popular graphic neural network generators (DALL-E, MidJourney, Stable Diffusion, Kandinsky, etc.). There was the reason of the widespread implementation and democratisation of artistic practices. The article analyses the processes of disappearance of the boundaries between art and kitsch in relation to computer ...
Added: July 1, 2024
Multi-user facial emotion recognition in video based on user-dependent neural network adaptation
Churaev E., Andrey V. Savchenko, , in: 2022 VIII International Conference on Information Technology and Nanotechnology (ITNT).: IEEE, 2022. P. 1–5.
In this paper, the multi-user video-based facial emotion recognition is examined in the presence of a small data set with the emotions of end users. By using the idea of speaker-dependent speech recognition, we propose a novel approach to solve this task if labeled video data from end users is available. During the training stage, ...
Added: September 25, 2022
Exploration in Sequential Recommender Systems via Graph Representations
Kiselev D., Makarov I., IEEE Access 2022 Vol. 10 P. 123614–123621
Temporal graph networks are powerful tools for solving the cold-start problem in sequential recommender systems. However, graph models are susceptible to feedback loops and data distribution shifts. The paper proposes a simple yet efficient graph-based exploration method for the mitigation of the issues above. It adopts the counter-based state exploration from reinforcement learning to the ...
Added: September 5, 2022
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit