SEARNN: Training RNNs with global-local losses

Leblond R.; Alayrac J.; A. Osokin; Lacoste-Julien S.

?

SEARNN: Training RNNs with global-local losses

P. 1–16.

Leblond R., Alayrac J., Osokin A., Lacoste-Julien S.

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropriate surrogate for the test error: by only maximizing the ground truth probability, it fails to exploit the wealth of information offered by structured losses. Further, it introduces discrepancies between training and predicting (such as exposure bias) that may hurt test performance. Instead, SEARNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error. We first demonstrate improved performance over MLE on two different tasks: OCR and spelling correction. Then, we propose a subsampling strategy to enable SEARNN to scale to large vocabulary sizes. This allows us to validate the benefits of our approach on a machine translation task.

Language: English

Text on another site

Keywords: deep learning structured prediction recurrent neural networks

In book

Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)

[б.и.], 2018.

Comparative Study of Training Methods and Architectures of Echo State Networks

Androsov I., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3 P. 87–114

This paper examines echo state networks (ESNs), one of the most prevalent approaches to implementing reservoir computing. An ESN consists of a recurrent neural network with fixed (untrained) weights and a readout layer that is typically linear and trainable. This approach enables the creation of energyefficient and computationally efficient neural networks capable of real-time learning. However, since ...

Added: May 26, 2026

Method of Critical Set construction for Successive Cancellation List Decoder of Polar Codes Based on Deep Learning of Neural Networks

Котов Ф. И., Timokhin I., Ivanov F., , in: 2023 XVIII International Symposium Problems of Redundancy in Information and Control Systems (REDUNDANCY).: IEEE, 2023.

The Successive Cancellation List (SCL) algorithm is a widely used decoding technique in communication systems. However, constructing the critical set for SCL decoding is a challenging task, as it requires a large number of computations and can lead to significant decoding delays. In this paper, a new approach to critical set construction for SCL decoding ...

Added: January 26, 2026

Ансамбль современных моделей компьютерного зрения для задачи обнаружения дипфейков

Pikul A. S., Безопасность информационных технологий 2024 Т. 31 № 4 С. 116–127

This article explores the potential use of modern computer vision architectures for the task of deepfake detection. The following architectures are considered: EfficientNet, Vision Transformer (ViT), VisionLSTM (ViL), Vision KAN, and Mamba Vision. The novelty of the approach lies in the application and comparison of these architectures, as well as their combination into paired ensembles ...

Added: December 12, 2025

Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V

Cham: Springer, 2025.

This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025. The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...

Added: September 29, 2025

Deep learning deciphers the related role of master regulators and G-quadruplexes in tissue specification

Artem B., Andreasyan A., Konovalov D. et al., Scientific Reports 2025 Vol. 15 Article 23119

G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for the genome-wide G-flipon predictions across 14 human tissue types. The model was trained using high-confidence experimental maps of GQ-forming sequences ...

Added: August 8, 2025

AI in drug development: advances in response, combination therapy, repositioning, and molecular design

Shaitan A., Science China Information Sciences 2025 Vol. 68 No. 7 Article 170102

Artificial intelligence (AI) is revolutionizing the field of drug development, particularly in addressing key challenges such as drug response prediction, drug combination design, drug repositioning, and drug molecule generation. Traditional drug discovery is hindered by long timelines, high costs, and low success rates, necessitating innovative technologies to accelerate the process. AI technologies, such as deep ...

Added: June 25, 2025

An Approach to Finding a Robust Deep Learning Model

Boldyrev A., Ratnikov F., Shevelev A., IEEE Access 2025 Vol. 13 P. 102390–102406

The rapid development of machine learning (ML) and artificial intelligence (AI) applications requires the training of a large numbers of models. This growing demand highlights the importance of training models without human supervision, while ensuring that their predictions are reliable. In response to this need, we propose a novel approach for determining model robustness. This approach, supplemented with a ...

Added: June 15, 2025

Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта

Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74

Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...

Added: June 5, 2025

Deep learning for customs classification of goods based on their textual descriptions analysis

Ryzhova A., Sochenkov I., , in: Proceeding 2019 Ivannikov Ispras Open Conference (ISPRAS).: IEEE Computer Society, 2019. P. 60–67.

Added: May 1, 2025

Distilling Normalizing Flows

Walton S., Klyukin V., Artemev M. et al., , in: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).: IEEE, 2025. P. 3328–3337.

Explicit density learners are becoming an increasingly popular technique for generative models because of their ability to better model probability distributions. They have advantages over Generative Adversarial Networks due to their ability to perform density estimation and having exact latent-variable inference. This has many advantages, including: being able to simply interpolate, calculate sample likelihood, and ...

Added: April 1, 2025

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Derkach D., Artemev M., IEEE, 2025.

Added: April 1, 2025

Deep learning captures the effect of epistasis in multifactorial diseases

Perelygin V., Kamelin A., Syzrantsev N. et al., Frontiers in Medicine 2025 Vol. 11 Article 1479717

Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular ...

Added: March 4, 2025

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Ivan Rubachev, Nikolay Kartashev, Gorishniy Y. et al., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 53831–53867.

Advances in machine learning research drive progress in real-world applications. To ensure this progress, it is important to understand the potential pitfalls on the way from a novel method's success on academic benchmarks to its practical deployment. In this work, we analyze existing tabular deep learning benchmarks and find two common characteristics of tabular data ...

Added: March 1, 2025

Weight Perturbations for Simulating Virtual Lesions in a Convolutional Neural Network

W. Joseph MacInnes, Zhozhikashvili N., Feurra M., , in: First International Conference, AIiH 2024, Swansea, UK, September 4–6, 2024, Proceedings, Part II. Artificial Intelligence in Healthcare. LNCS, volume 14976Vol. 14976.: Springer, 2024. P. 221–234.

Convolutional Neural Networks (CNNs) match human performance in many visual tasks like the classification of images, however they may not simulate the underlying biological processes. We implemented a CNN to try replicate results from an object inversion experiment with Transcranial Magnetic Stimulation (TMS). After training on upright faces, the CNN model went through three stages ...

Added: January 28, 2025

TabR: Tabular Deep Learning Meets Nearest Neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev et al., , in: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024).: ICLR, 2024.

Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves ...

Added: January 22, 2025

Deep Learning Approaches for LHCb ECAL Reconstruction

Boldyrev A., Derkach D., Ratnikov F. et al., EPJ Web of Conferences 2024 Vol. 295 Article 09008

Calorimeters are a crucial component for most detectors mounted on modern colliders. Their tasks include identifying and measuring the energy of photons and neutral hadrons, recording energetic hadronic jets, and contributing to the identification of electrons, muons, and charged hadrons. To fulfill these many tasks while keeping costs reasonable, the calorimeter construction requires good and ...

Added: January 8, 2025

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Gorishniy Y., Kotelnikov A., Babenko A., , in: The Thirteenth International Conference on Learning Representations: ICLR 2025.: ICLR, 2025.

Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for substantially improving tabular MLPs: namely, parameter-efficient ensembling -- a paradigm for implementing an ensemble of models as one model producing multiple predictions. We ...

Added: December 24, 2024

Может ли искусственный интеллект прогнозировать решения суда? Систематический обзор международных исследований

Kazun A., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 5 С. 100–122

Advancements in artificial intelligence technologies and the emergence of open databases containing judicial decisions have led to rapid improvements in algorithms capable of classifying legal documents and forecasting decisions made by judges. This article examines a body of international research dedicated to the question of how accurately AI can predict judges’ decisions, and consequently, whether ...

Added: November 29, 2024

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950

Cham: Springer, 2024.

This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...

Added: November 22, 2024