Compressing deep convolutional neural networks in visual emotion recognition
In this paper, we consider the problem of insufficient runtime and memory space complexities of deep convolutional neural networks for visual emotion recognition. A survey of recent compression methods and efficient neural networks architectures is provided. We experimentally compare the computational speed and memory consumption during the training and the inference stages of such methods as the weights matrix decomposition, binarization and hashing. It is shown that the most efficient optimization can be achieved with the matrices decomposition and hashing. Finally, we explore the possibility to distill the knowledge from the large neural network, if only large unlabeled sample of facial images is available.
Brain-computer interfaces find application in a number of different areas and have the potential to be used for research as well as for practical purposes. The clinical use of BCI includes current studies on neurorehabilitation ([Frolov et al., 2013; Ang et al., 2010]), and there is the prospect of using BCI to restore movement and communication capabilities, providing alternative effective pathways to those that may be lost due to injury or illness. The processing of electrophysiological data requires analysis of high-dimensional, nonstationary, noisy signals reflecting complex underlying processes and structures. We have shown that for non-invasive neuroimaging methods such as EEG the potential improvement lies in the field of machine learning and involves designing data analysis algorithms that can model physiological and psychoemotional variability of the user. The development of such algorithms can be conducted in different ways, including the classical Bayesian paradigm as well as modern deep learning architectures. The interpretation of nonlinear decision rules implemented by multilayer structures would enable automatic and objective knowledge extraction from the neurocognitive experiments data. Despite the advantages of non-invasive neuroimaging methods, a radical increase in the bandwidth of the BCI communication channel and the use of this technology for the prosthesis control is possible only through invasive technologies. Electrocorticogram (ECoG) is the least invasive of such technologies, and in the final part of this work we demonstrate the possibility of using ECoG to decode the kinematic characteristics of the finger movement.
Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year. The conference focus on areas of intelligent systems and artificial intelligence (AI) and how it applies to the real world. IntelliSys is one of the best respected Artificial Intelligence (AI) Conference.
Autonomous taxies are in high demand for smart city scenario. Such taxies have a well specified path to travel. Therefore, these vehicles only required two important parameters. One is detection parameter and other is control parameter. Further, detection parameters require turn detection and obstacle detection. The control parameters contain steering control and speed control. In this paper a novel autonomous taxi model has been proposed for smart city scenario. Deep learning has been used to model the human driver capabilities for the autonomous taxi. A hierarchical Deep Neural Network (DNN) architecture has been utilized to train various driving aspects. In first level, the proposed DNN architecture classifies the straight and turning of road. A parallel DNN is used to detect obstacle at level one. In second level, the DNN discriminates the turning i.e. left or right for steering and speed controls. Two multi layered DNNs have been used on Nvidia Tesla K 40 GPU based system with Core i-7 processor. The mean squared error (MSE) for the detection parameters viz. speed and steering angle were 0.018 and 0.0248 percent, respectively, with 15 milli seconds of realtime response delay.
A new public dataset of traffic sign images is presented. The dataset is intended for training and testing the algorithms of traffic sign recognition. We describe the dataset structure and guidelines for working with the dataset, comparing it with the previously published traffic sign datasets. The evaluation of modern detection and classification algorithms conducted using the proposed dataset has shown that existing methods of recognition of a wide class of traffic signs do not achieve the accuracy and completeness required for a number of applications.
In this paper, we consider the problem of insufficient runtime and memory-space complexities of contemporary deep convolutional neural networks in the problem of image recognition. A survey of recent compression methods and efficient neural networks architectures is provided. The experimental study is focused on the visual emotion recognition problem. We compare the computational speed and memory consumption during the training and the inference stages of such methods as the weights matrix decomposition, binarization and hashing in the visual emotion recognition problem. It is experimentally shown that the most efficient recognition is achieved with the full network binarization and matrices decomposition.
The performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field and include topics such as deep learning and feature learning, metric learning, compositional modeling, structured prediction, reinforcement learning, and issues regarding large-scale learning and non-convex optimization. The range of domains to which these techniques apply is also very broad, from vision to speech recognition, text understanding, gaming, music, etc.
In the paper we present a new framework for dealing with probabilistic graphical models. Our approach relies on the recently proposed Tensor Train format (TT-format) of a tensor that while being compact allows for efficient application of linear algebra operations. We present a way to convert the energy of a Markov random field to the TT-format and show how one can exploit the properties of the TT-format to attack the tasks of the partition function estimation and the MAP-inference. We provide theoretical guarantees on the accuracy of the proposed algorithm for estimating the partition function and compare our methods against several state-of-the-art algorithms.