• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Unsupervised learning of general-purpose embeddings for code changes
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 2, 2026
HSE Study Reveals Imbalance in the Generative AI Market
Researchers at HSE University analysed how effectively the global generative artificial intelligence market converts investment into real revenue, concluding that AI is currently developing faster than it is paying off. The results have been published in the journal Foresight and STI Governance.
June 2, 2026
Discovering Science through Russian Language: HSE Prep Year Students Present at International Conference in Kazan
On May 23, 2026, the V International Scientific and Practical Conference ‘Discovering the World of Science’ took place in Kazan at the Preparatory Faculty for International Students of Kazan Federal University. Four students of the HSE International Preparatory Year took part in the event: two delivered their presentations in person, while two participated online. Their work was supervised by Acting Director of the International Prep Year Irina Isaeva and lecturer Ekaterina Kozhemyakova.
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Unsupervised learning of general-purpose embeddings for code changes

Ch. 171275. P. 7–12.
Pravilov M., Bogomolov E., Golubev Y., Bryksin T.

Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different downstream tasks - applying changes to code and commit message generation. During pre-training, the model learns to apply the given code change in a correct way. This task requires only code changes themselves, which makes it unsupervised. In the task of applying code changes, our model outperforms baseline models by 5.9 percentage points in accuracy. As for the commit message generation, our model demonstrated the same results as supervised models trained for this specific task, which indicates that it can encode code changes well and can be improved in the future by pre-training on a larger dataset of easily gathered code changes. © 2021 ACM.

Language: English
Full text
DOI
Text on another site
Keywords: Unsupervised learningCode changesCommit message generation

In book

MaLTESQuE 2021: Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution
ACM, 2021.
Similar publications
Learning to hear broken motors: Signature-guided data augmentation for induction motor diagnostics
Ali S., Khizhik A., Svirin S. et al., Engineering Applications of Artificial Intelligence 2025 Vol. 170 Article 114137
The application of machine learning algorithms in the intelligent diagnosis of three-phase engine has the potential to significantly enhance diagnostic performance and accuracy. Traditional methods largely rely on signature analysis, which, despite being a standard practice, can benefit from the integration of advanced machine learning techniques. In our study, we innovate by combining machine learning ...
Added: February 16, 2026
From Patterns to Predictions: A Shapelet-Based Framework for Directional Forecasting in Noisy Financial Markets
Kim J., Lee H., Jeon H. et al., , in: CIKM '25: Proceedings of the 34rd ACM International Conference on Information and Knowledge Management.: ACM, 2025. P. 1344–1353.
Directional forecasting in financial markets requires both accuracy and interpretability. Before the advent of deep learning, interpretable approaches based on human-defined patterns were prevalent, but their structural vagueness and scale ambiguity hindered generalization. In contrast, deep learning models can effectively capture complex dynamics, yet often offer limited transparency. To bridge this gap, we propose a ...
Added: November 21, 2025
Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
Kirill Struminsky, Artyom Gadetsky, Denis Rakitin et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 10999–11011.
Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to define a latent variable as a perturbed algorithm output and to use a differentiable surrogate for training. In general, the surrogate puts additional constraints ...
Added: March 14, 2022
Formal Concept Analysis: 16th International Conference, ICFCA 2021, Strasbourg, France, June 29 – July 2, 2021, Proceedings
Springer, 2021.
This book constitutes the proceedings of the 16th International Conference on Formal Concept Analysis, ICFCA 2021, held in Strasbourg, France, in June/July 2021. The 14 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 32 submissions. The book also contains four invited contributions in full paper length. The research part ...
Added: July 10, 2021
A density-based statistical analysis of graph clustering algorithm performance
Miasnikof P., Shestopaloff A. Y., Bonner A. J. et al., Journal of Complex Networks 2020 Vol. 8 No. 3 P. 1–33
We introduce graph clustering quality measures based on comparisons of global, intra- and inter-cluster densities, an accompanying statistical significance test and a step-by-step routine for clustering quality assessment. Our work is centred on the idea that well-clustered graphs will display a mean intra-cluster density that is higher than global density and mean inter-cluster density. We ...
Added: August 4, 2020
A Simple Method to Evaluate Support Size and Non-uniformity of a Decoder-Based Generative Model
Struminsky K., Vetrov D., Lecture Notes in Computer Science 2019 Vol. 11832 P. 81–93
Theoretical analysis in [1] suggested that adversarially trained generative models are naturally inclined to learn distribution with low support. In particular, this effect is caused by the limited capacity of the discriminator network. To verify this claim, [2] proposed a statistical test based on the birthday paradox that partially confirmed the analysis. In this paper, ...
Added: April 23, 2020
Variational Autoencoder with Arbitrary Conditioning
Vetrov D., Ivanov O., , in: Proceedings of the 7th International Conference on Learning Representations (ICLR 2019).: ICLR, 2019. P. 1–25.
We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot". The features may be both real-valued and categorical. Training of the model is performed by stochastic variational Bayes. The experimental evaluation on synthetic data, ...
Added: March 13, 2020
Towards Automatic Manipulation of Arbitrary Structures in Connectivist Paradigm with Tensor Product Variable Binding
Demidovskij A., , in: Advances in Neural Computation, Machine Learning, and Cognitive Research III.: Springer, 2020. P. 375–383.
Building a bridge between symbolic and connectionist level of computations requires constructing a full pipeline that accepts symbolic structures as an input, translates them to distributed representation, performs manipulations with this representation equivalent to symbolic manipulations and translates it back to the symbolic structure. This work proposes neural architecture that is capable of joining two ...
Added: October 27, 2019
Использование метода главных компонент для анализа надежности цепей поставок
Kuznetsov V. O., Логистика и управление цепями поставок 2018 № 4 (87) С. 27–33
One of the options for a more flexible approach to analyzing the reliability of supply chains is the principal component analysis (PCA). With a large number of variables describing supply chain, it is a difficult task to analyze the structure of variables in two-dimensional space. Within the analysis of the variables dependencies PCA allows to ...
Added: November 29, 2018
Mining convex polygon patterns with formal Concept Analysis
Belfodil A., Kuznetsov S., Robardet C. et al., , in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017,Melbourne, Australia, 19-25 August 2017.: Melbourne: International Joint Conferences on Artificial Intelligence, 2017. P. 1425–1432.
Pattern mining is an important task in AI for eliciting hypotheses from the data. When it comes to spatial data, the geo-coordinates are often considered independently as two different attributes. Consequently, rectangular shapes are searched for. Such an arbitrary form is not able to capture interesting regions in general. We thus introduce convex polygons, a ...
Added: December 6, 2017
Устойчивый к шуму метод обучения вариационного автокодировщика
Figurnov M., Struminsky K., Vetrov D., Интеллектуальные системы. Теория и приложения 2017 Т. 21 № 2 С. 90–109
Variational autoencoder (VAE) is a probabilistic unsupervised method that uses deep learning. We propose a robust approach to the training of VAE using a modified likelihood function. We propose and analyze two variational lower bound objectives. The effectiveness of the method is experimentally shown by artificially introducing noise objects. ...
Added: October 18, 2017
Лексическая сочетаемость как ключевой компонент в процессе формирования коммуникативной компетенции
Shemyakina V. I., В кн.: Коммуникация в современном поликультурном мире: диалог культур: Сборник научно-практических трудовВып. 2.: М.: Pearson Education Limited (российское представительство), 2014. С. 568–579.
English language teaching improvement has as its goal the communicative competence development within integration processes.Collocations are essential for communicative competence development. Collocations and different forms of unsupervised acquisition are compulsory components for IELTS preparation. ...
Added: March 5, 2015
On Hölder fields clustering
Cadre B., Paris Quentin, TEST 2012 Vol. 21 No. 2 P. 301–316
Based on n randomly drawn vectors in a Hilbert space, we study the k-means clustering scheme. Here, clustering is performed by computing the Voronoi partition associated with centers that minimize an empirical criterion, called distorsion. The performance of the method is evaluated by comparing the theoretical distorsion of empirical optimal centers to the theoretical optimal distorsion. Our first ...
Added: December 20, 2014
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit