Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.
Research into the market graph is attracting increasing attention in stock market analysis. One of the important problems connected with the market graph is its identification from observations. The standard way of identifying the market graph is to use a simple procedure based on statistical estimations of Pearson correlations between pairs of stocks. Recently a new class of statistical procedures for market graph identification was introduced and the optimality of these procedures in the Pearson correlation Gaussian network was proved. However, the procedures obtained have a high reliability only for Gaussian multivariate distributions of stock attributes. One of the ways to correct this problem is to consider different networks generated by different measures of pairwise similarity of stocks. A new and promising model in this context is the sign similarity network. In this paper the market graph identification problem in the sign similarity network is reviewed. A new class of statistical procedures for the market graph identification is introduced and the optimality of these procedures is proved. Numerical experiments reveal an essential difference in the quality between optimal procedures in sign similarity and Pearson correlation networks. In particular, it is observed that the quality of the optimal identification procedure in the sign similarity network is not sensitive to the assumptions on the distribution of stock attributes.
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and ShakeShake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.
In this paper we address the problem of forecasting the target events of a time series given the distribution ξξ of time gaps between target events. Strong earthquakes and stock market crashes are the two types of such events that we are focusing on. In the series of earthquakes, as McCann et al. show [W.R. Mc Cann, S.P. Nishenko, L.R. Sykes, J. Krause, Seismic gaps and plate tectonics: seismic potential for major boundaries, Pure and Applied Geophysics 117 (1979) 1082–1147], there are well-defined gaps (called seismic gaps) between strong earthquakes. On the other hand, usually there are no regular gaps in the series of stock market crashes [M. Raberto, E. Scalas, F. Mainardi, Waiting-times and returns in high-frequency financial data: an empirical study, Physica A 314 (2002) 749–755]. For the case of seismic gaps, we analytically derive an upper bound of prediction efficiency given the coefficient of variation of the distribution ξξ. For the case of stock market crashes, we develop an algorithm that predicts the next crash within a certain time interval after the previous one. We show that this algorithm outperforms random prediction. The efficiency of our algorithm sets up a lower bound of efficiency for effective prediction of stock market crashes.
In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending of the grammar in order to filter unknown commands. As a result, the grammar is resistant to distortions and unexpected words near or inside of command. We implemented the proposed approach using Finite State Transducers in the Kaldi framework and examined it using self-recorded noised data with various level of signal-to-noise ratio. We compared recognition accuracy and average decision-making time of our approach with the state-of-the-art continuous speech recognition engines based on language models. It was experimentally shown that our approach is characterized by up to 60% higher accuracy than conventional offline speech recognition methods based on language models. The speed of utterance recognition is 3 times higher than speed of traditional continuous speech recognition algorithms.
In this paper, we consider the problem of insufficient runtime and memory-space complexities of contemporary deep convolutional neural networks in the problem of image recognition. A survey of recent compression methods and efficient neural networks architectures is provided. The experimental study is focused on the visual emotion recognition problem. We compare the computational speed and memory consumption during the training and the inference stages of such methods as the weights matrix decomposition, binarization and hashing in the visual emotion recognition problem. It is experimentally shown that the most efficient recognition is achieved with the full network binarization and matrices decomposition.
In the process of astronomical observations are collected vast amounts of data. BSA (Big Scanning Antenna) LPI used in the study of impulse phenomena, daily logs 87.5 GB of data (32 TB per year). Experts classified 83096 individual observations (on the segment of the study July 2012 - October 2013). Over 75% of the sample correspond to pulsars, twinkling springs and rapid radiotransmitter, and all other classes of observations belong to hardware failures, interference, the flight of the Earth satellite and aircraft. There were allocated 15 classes of observations.
Such a sample, divided into classes allows using the machine learning algorithms. It has become possible to develop an automated service for short-term/long-term monitoring of various classes of radio sources (including radiotransmitted different nature), monitoring the Earth's ionosphere, the interplanetary and the interstellar plasma, the search and monitoring of different classes of radio sources. Monitoring in this case refers to the automatic filtering and detection of a previously unclassified impulse phenomena.
Currently, for automatic filtering, statistical analysis methods are used. This report examines an alternative method supposed to be using neural network machine learning algorithm that processes the input into raw data and after processing by the hidden layer through the output layer determines the class of pulse phenomena.
Creating a neural network model, trained on a sample and performing a classification of previously unclassified impulse phenomena is performed using the cloud service Microsoft Azure Machine Learning Studio. The Web service has been created based on the model allows classifying single impulse phenomena in real time (Request / Reply) and data sampling for a certain period (Batch processing).
Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)
This two-volume set LNCS 10305 and LNCS 10306 constitutes the refereed proceedings of the 15th International Work-Conference on Artificial Neural Networks, IWANN 2019, held at Gran Canaria, Spain, in June 2019. The 150 revised full papers presented in this two-volume set were carefully reviewed and selected from 210 submissions. The papers are organized in topical sections on machine learning in weather observation and forecasting; computational intelligence methods for time series; human activity recognition; new and future tendencies in brain-computer interface systems; random-weights neural networks; pattern recognition; deep learning and natural language processing; software testing and intelligent systems; data-driven intelligent transportation systems; deep learning models in healthcare and biomedicine; deep learning beyond convolution; artificial neural network for biomedical image processing; machine learning in vision and robotics; system identification, process control, and manufacturing; image and signal processing; soft computing; mathematics for neural networks; internet modeling, communication and networking; expert systems; evolutionary and genetic algorithms; advances in computational intelligence; computational biology and bioinformatics.