• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Training Transformers Together
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Training Transformers Together

P. 335–342.
Borzunov A., Ryabinin M., Dettmers T., Lhoest Q., Saulnier L., Diskin M., Jernite Y., Wolf T.
Language: English
Full text
DOI
Text on another site
Keywords: distributed computingdeep learningtransformers

In book

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track
PMLR, 2022.
Similar publications
Multimodal graph, surface, and language-based model for protein protein interaction prediction
Arteaga Moreano B. D., Chervov N., Poptsova M., Scientific Reports 2026 Vol. 16 No. 1 Article 4772
Accurate prediction of protein-protein interactions (PPIs) is fundamental to understanding biological processes and disease mechanisms. While deep learning offers a powerful alternative to costly experimental methods, existing approaches often overlook critical protein-surface information and rely on simplistic feature fusion techniques, thereby limiting performance. To address this, we introduce GSMFormer-PPI, a novel multimodal framework that integrates ...
Added: February 4, 2026
Method of Critical Set construction for Successive Cancellation List Decoder of Polar Codes Based on Deep Learning of Neural Networks
Котов Ф. И., Timokhin I., Ivanov F., , in: 2023 XVIII International Symposium Problems of Redundancy in Information and Control Systems (REDUNDANCY).: IEEE, 2023.
The Successive Cancellation List (SCL) algorithm is a widely used decoding technique in communication systems. However, constructing the critical set for SCL decoding is a challenging task, as it requires a large number of computations and can lead to significant decoding delays. In this paper, a new approach to critical set construction for SCL decoding ...
Added: January 26, 2026
Распределённые компьютерные и телекоммуникационные сети: управление, вычисление, связь (DCCN-2023)
-, 2023.
В научном электронном издании представлены материалы XXVI Международной научной конференции «Распределенные компьютерные и телекоммуникационные сети: управление, вычисление, связь» по следующим направлениям: - Алгоритмы и протоколы телекоммуникационных сетей  - Управление в компьютерных и инфокоммуникационных системах - Анализ производительности, оценка QoS / QoE и эффективность сетей - Аналитическое и имитационное моделирование коммуникационных систем последующих поколений - Эволюция беспроводных сетей в направлении 5G; - Технологии сантиметрового и миллиметрового ...
Added: December 18, 2025
Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V
Cham: Springer, 2025.
This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025.   The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...
Added: September 29, 2025
Deep learning deciphers the related role of master regulators and G-quadruplexes in tissue specification
Artem B., Andreasyan A., Konovalov D. et al., Scientific Reports 2025 Vol. 15 Article 23119
G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for the genome-wide G-flipon predictions across 14 human tissue types. The model was trained using high-confidence experimental maps of GQ-forming sequences ...
Added: August 8, 2025
Выполнение распределенных вычислительных экспериментов на MLOps платформе НИУ ВШЭ
Хританков А. С., Полежаев В. А., Zhulikov G. et al., Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика 2025 Т. 14 № 2 С. 42–66
Despite the wide spread and successful application of data mining and processing tools for solving individual applied problems, the problem of developing a technology for creating such software tools has not yet been solved. In the context of a unified MLOps process for creating machine learning technologies, this paper considers the emerging problems of automating ...
Added: July 28, 2025
AI in drug development: advances in response, combination therapy, repositioning, and molecular design
Shaitan A., Science China Information Sciences 2025 Vol. 68 No. 7 Article 170102
Artificial intelligence (AI) is revolutionizing the field of drug development, particularly in addressing key challenges such as drug response prediction, drug combination design, drug repositioning, and drug molecule generation. Traditional drug discovery is hindered by long timelines, high costs, and low success rates, necessitating innovative technologies to accelerate the process. AI technologies, such as deep ...
Added: June 25, 2025
An Approach to Finding a Robust Deep Learning Model
Boldyrev A., Ratnikov F., Shevelev A., IEEE Access 2025 Vol. 13 P. 102390–102406
The rapid development of machine learning (ML) and artificial intelligence (AI) applications requires the training of a large numbers of models. This growing demand highlights the importance of training models without human supervision, while ensuring that their predictions are reliable. In response to this need, we propose a novel approach for determining model robustness. This approach, supplemented with a ...
Added: June 15, 2025
Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта
Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74
Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...
Added: June 5, 2025
Deep learning for customs classification of goods based on their textual descriptions analysis
Ryzhova A., Sochenkov I., , in: Proceeding 2019 Ivannikov Ispras Open Conference (ISPRAS).: IEEE Computer Society, 2019. P. 60–67.
Added: May 1, 2025
Distilling Normalizing Flows
Walton S., Klyukin V., Artemev M. et al., , in: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).: IEEE, 2025. P. 3328–3337.
Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and ...
Added: April 1, 2025
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Derkach D., Artemev M., IEEE, 2025.
Added: April 1, 2025
Deep learning captures the effect of epistasis in multifactorial diseases
Perelygin V., Kamelin A., Syzrantsev N. et al., Frontiers in Medicine 2025 Vol. 11 Article 1479717
Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular ...
Added: March 4, 2025
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks
Ivan Rubachev, Nikolay Kartashev, Gorishniy Y. et al., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 53831–53867.
Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical deployment. In this work, we analyze existing tabular deep learning benchmarks and find two common characteristics of tabular data ...
Added: March 1, 2025
Weight Perturbations for Simulating Virtual Lesions in a Convolutional Neural Network
W. Joseph MacInnes, Zhozhikashvili N., Feurra M., , in: First International Conference, AIiH 2024, Swansea, UK, September 4–6, 2024, Proceedings, Part II. Artificial Intelligence in Healthcare. LNCS, volume 14976Vol. 14976.: Springer, 2024. P. 221–234.
Convolutional Neural Networks (CNNs) match human performance in many visual tasks like the classification of images, however they may not simulate the underlying biological processes. We implemented a CNN to try replicate results from an object inversion experiment with Transcranial Magnetic Stimulation (TMS). After training on upright faces, the CNN model went through three stages ...
Added: January 28, 2025
TabR: Tabular Deep Learning Meets Nearest Neighbors
Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev et al., , in: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024).: ICLR, 2024.
Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves ...
Added: January 22, 2025
Deep Learning Approaches for LHCb ECAL Reconstruction
Boldyrev A., Derkach D., Ratnikov F. et al., EPJ Web of Conferences 2024 Vol. 295 Article 09008
Calorimeters are a crucial component for most detectors mounted on modern colliders. Their tasks include identifying and measuring the energy of photons and neutral hadrons, recording energetic hadronic jets, and contributing to the identification of electrons, muons, and charged hadrons. To fulfill these many tasks while keeping costs reasonable, the calorimeter construction requires good and ...
Added: January 8, 2025
TabM: Advancing tabular deep learning with parameter-efficient ensembling
Gorishniy Y., Kotelnikov A., Babenko A., , in: The Thirteenth International Conference on Learning Representations: ICLR 2025.: ICLR, 2025.
Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for substantially improving tabular MLPs: namely, parameter-efficient ensembling -- a paradigm for implementing an ensemble of models as one model producing multiple predictions. We ...
Added: December 24, 2024
Может ли искусственный интеллект прогнозировать решения суда? Системати­ческий обзор международных исследований
Kazun A., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 5 С. 100–122
Advancements in artificial intelligence technologies and the emergence of open databases containing judicial decisions have led to rapid improvements in algorithms capable of classifying legal documents and forecasting decisions made by judges. This article examines a body of international research dedicated to the question of how accurately AI can predict judges’ decisions, and consequently, whether ...
Added: November 29, 2024
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950
Cham: Springer, 2024.
This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...
Added: November 22, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit