Training Transformers Together

A. Borzunov; M. Ryabinin; Dettmers T.; Lhoest Q.; Saulnier L.; M. Diskin; Jernite Y.; Wolf T.

doi:10.48550/arXiv.2207.03481

Publications

?

Training Transformers Together

P. 335-342.

Borzunov A., Ryabinin M., Dettmers T., Lhoest Q., Saulnier L., Diskin M., Jernite Y., Wolf T.

Keywords: distributed computing deep learning transformers

In book

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track

PMLR, 2022

ABC: A Big CAD Model Dataset For Geometric Deep Learning

Koch S., Matveev A., Jiang Z. et al., , in : Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019). : IEEE, 2019. P. 9601-9611.

We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. Each model is a collection of explicitly parametrized curves and surfaces, providing ground truth for differential quantities, patch segmentation, geometric feature detection, and shape reconstruction. Sampling the parametric descriptions of surfaces and curves allows ...

Added: November 26, 2019

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in : Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information Science. Vol. 1086.: Springer, 2020. P. 154-159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Using Resources of Supercomputing Centers with Everest Platform

Smirnov S., Sukhoroslov O. V., Voloshinov V., , in : Supercomputing. RuSCDays 2018. Communications in Computer and Information Science, vol 965. Springer, Cham. : Springer, 2019. P. 687-698.

High-performance computing plays an increasingly important role in modern science and technology. However, the lack of convenient interfaces and automation tools greatly complicates the widespread use of HPC resources among scientists. The paper presents an approach to solving these problems relying on Everest, a web-based distributed computing platform. The platform enables convenient access to HPC ...

Added: October 19, 2019

35th International Symposium on Distributed Computing (DISC 2021)

Dagstuhl Publishing, 2021

Welcome to the DISC 2021, the 35th International Symposium on Distributed Computing, held on October 4–18, 2021. DISC is an international forum on the theory, design, analysis, and implementation of distributed systems and networks, focusing on distributed computing in all its forms. DISC is organized in cooperation with the European Association for Theoretical Computer Science ...

Added: October 14, 2021

PODC'21: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing

Association for Computing Machinery (ACM), 2021

Welcome to the 40th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC 2021), held virtually (due to the COVID-19 pandemic) on July 26-30, 2021. PODC is the premier forum for presentation of research on all aspects of distributed computing, including the theory, design, implementation, and applications of distributed algorithms, systems, and networks. This volume contains ...

Added: October 14, 2021

On Embeddings for Numerical Features in Tabular Deep Learning

Gorishniy Y., Ivan Rubachev, Babenko A., , in : Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022. : Curran Associates, Inc., 2022. Ch. 1. P. 24991-25004.

Added: January 28, 2023

Transformers: “The End of History” for Natural Language Processing?

Chernyavskiy A., Ilvovsky D., Nakov P., , in : Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings,. * 3.: Springer, 2021. P. 677-693.

Added: November 12, 2021

Advanced Computing. 10th International Conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised Selected Papers, Part II

Springer, 2021

10th International Conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised Selected Papers, Part II series: Communications in Computer and Information Science (2021) volume 1368 ...

Added: July 7, 2021

Lost in Conversation: A Conversational Agent Based on the Transformer and Transfer Learning

Golovanov S., Tselousov A., Rauf Kurbanov et al., , in : The NeurIPS '18 Competition: From Machine Learning to Intelligent Conversations. : Springer, 2020. P. 295-315.

Added: February 20, 2021

Workshop of the 5th International Conference on Learning Representations (ICLR)

[б.и.], 2017

The performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field and include ...

Added: October 31, 2018

Intelligent Distributed Computing VII. Proceedings of the 7th International Symposium on Intelligent Distributed Computing - IDC 2013, Prague, Czech Republic, September 2013

Dordrecht, L., Cham, Heidelberg, NY : Springer, 2014

This book represents the combined peer-reviewed proceedings of the Seventh International Symposium on Intelligent Distributed Computing - IDC-2013, of the Second Workshop on Agents for Clouds - A4C-2013, of the Fifth International Workshop on Multi-Agent Systems Technology and Semantics - MASTS-2013, and of the International Workshop on Intelligent Robots - iR-2013. All the events were ...

Added: March 13, 2015

Proceedings of the 9th International Conference on Utility and Cloud Computing

NY : ACM, 2016

Added: August 30, 2018

The Deep Weight Prior

Atanov A., Ashukha A., Struminsky K. et al., , in : Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). : ICLR, 2019. P. 1-17.

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of ...

Added: September 2, 2019

Differentiable Rendering with Reparameterized Volume Sampling

Morozov N., Rakitin D., Oleg Desheulin et al., , in : Neural Fields across Fields: Methods and Applications of Implicit Neural Representations. ICLR 2023 Workshop. : [б.и.], 2023. Ch. 8.

In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene pictures. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is ...

Added: July 18, 2023

Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network

Petrosyan A., Voskoboynikov A., Sukhinin D. et al., Journal of Neural Engineering 2022 Vol. 19 No. 6 Article 066016

Objective. Speech decoding, one of the most intriguing brain-computer interface applications, opens up plentiful opportunities from rehabilitation of patients to direct and seamless communication between human species. Typical solutions rely on invasive recordings with a large number of distributed electrodes implanted through craniotomy. Here we explored the possibility of creating speech prosthesis in a minimally ...

Added: December 9, 2022

Deep Learning for Non-Invasive Cortical Potential Imaging

Razorenova A., Yavich N., Malovichko M. et al., , in : Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro-oncology. Third International Workshop, MLCN 2020, and Second International Workshop, RNO-AI 2020. Lecture Notes in Computer Science. Vol. 12449: Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro-oncology.: Springer, 2020. Ch. 5. P. 45-55.

Electroencephalography (EEG) is a well-established non-invasive technique to measure the brain activity, albeit with a limited spatial resolution. Variations in electric conductivity between different tissues distort the electric fields generated by cortical sources, resulting in smeared potential measurements on the scalp. One needs to solve an ill-posed inverse problem to recover the original neural activity. In this article, ...

Added: December 10, 2020

Unet-boosted classifier – мультизадачная архитектура для малых выборок на примере классификации МРТ снимков головного мозга

Sobyanin K., Kulikova S., Информатика и автоматизация (Труды СПИИРАН) 2024 Т. 23 № 4 С. 1022-1046

The problem of training deep neural networks on small samples is especially relevant for medical problems. The paper examines the impact of pixel-wise marking of significant objects in the image, over the true class label, on the quality of the classification. To achieve better classification results on small samples, we propose a multitasking architecture -- ...

Added: June 29, 2024

Training restricted Boltzmann machines to generate human-like eye movements

Krasovskaya S., Zhulikov G., MacInnes W., , in : European Conference on Visual Perception 2017 Abstract Book. : [б.и.], 2017. Ch. 2. P. 18-18.

Approximately twenty years ago, Laurent Itti and Christof Koch created a saliency map of visual attention in an attempt to recreate the work of biological pyramidal neurons by mimicking neurons with centre-surround receptive fields. The Saliency Model launched many studies that contributed to the understanding of layers of vision and the sphere of visual attention. ...

Added: October 15, 2018

Recognition of DNA Secondary Structures as Nucleosome Barriers with Deep Learning Methods

Pavlov F., Poptsova M., , in : 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). : Seul : IEEE, 2020. P. 2800-2805.

Added: March 29, 2021

User-controllable Multi-texture Synthesis with Generative Adversarial Networks

Alanov A., Kochurov M., Volkhonskiy D. et al., , in : Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2020). Vol. 4.: SciTePress, 2020. P. 214-221.

We propose a novel multi-texture synthesis model based on generative adversarial networks (GANs) with a user-controllable mechanism. The user control ability allows to explicitly specify the texture which should be generated by the model. This property follows from using an encoder part which learns a latent representation for each texture from the dataset. To ensure ...

Added: November 8, 2020

LIORI at SemEval-2021 Task 8: Ask Transformer for measurements

Davletov A., Gordeev D., Nikolay Arefyev et al., , in : Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). : Association for Computational Linguistics, 2021. P. 1249-1254.

This work describes our approach for subtasks of SemEval-2021 Task 8: MeasEval: Counts and Measurements which took the official first place in the competition. To solve all subtasks we use multi-task learning in a question-answering-like manner. We also use learnable scalar weights to weight subtasks’ contribution to the final loss in multi-task training. We fine-tune ...

Added: September 23, 2021

Interpretable Feature Generation in ECG Using a Variational Autoencoder

Kuznetsov V. V., Moskalenko V. A., Gribanov D. et al., Frontiers in Genetics 2021 Article 638191

We propose a method for generating an electrocardiogram (ECG) signal for one cardiac cycle using a variational autoencoder. Our goal was to encode the original ECG signal using as few features as possible. Using this method we extracted a vector of new 25 features, which in many cases can be interpreted. The generated ECG has ...

Added: October 29, 2021

Bayesian Sparsification of Recurrent Neural Networks

Lobacheva E., Chirkova N., Vetrov D., / International Conference on Machine Learning. Series 1 "Workshop on Learning to Generate Natural Language". 2017.

Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout (Molchanov et al., 2017) eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural ...

Added: October 19, 2017

Fault detection in Tennessee Eastman process with temporal deep learning models

Lomov I., Lyubimov M., Makarov I. et al., Journal of Industrial Information Integration 2021 Vol. 23 Article 100216

Automated early process fault detection and prediction remains a challenging problem in industrial processes. Traditionally it has been done by multivariate statistical analysis of sensor readings and, more recently, with the help of machine learning methods. The quality of machine learning models strongly depends on feature engineering, that in turn heavily relies on expertise of ...

Added: March 21, 2021