Сверточные нейронные сети в задаче распознавания пола и возраста по видеоизображению
In this paper we examine the age and gender video-based recognition problem using deep convolutional neural networks. The comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames is presented. In order to improve the age and gender identification accuracy we implement the video-based recognition system with several aggregation methods. We provide the experimental comparison for IJB-A, Indian Movies and Kinect datasets. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively.
Brain-computer interfaces find application in a number of different areas and have the potential to be used for research as well as for practical purposes. The clinical use of BCI includes current studies on neurorehabilitation ([Frolov et al., 2013; Ang et al., 2010]), and there is the prospect of using BCI to restore movement and communication capabilities, providing alternative effective pathways to those that may be lost due to injury or illness. The processing of electrophysiological data requires analysis of high-dimensional, nonstationary, noisy signals reflecting complex underlying processes and structures. We have shown that for non-invasive neuroimaging methods such as EEG the potential improvement lies in the field of machine learning and involves designing data analysis algorithms that can model physiological and psychoemotional variability of the user. The development of such algorithms can be conducted in different ways, including the classical Bayesian paradigm as well as modern deep learning architectures. The interpretation of nonlinear decision rules implemented by multilayer structures would enable automatic and objective knowledge extraction from the neurocognitive experiments data. Despite the advantages of non-invasive neuroimaging methods, a radical increase in the bandwidth of the BCI communication channel and the use of this technology for the prosthesis control is possible only through invasive technologies. Electrocorticogram (ECoG) is the least invasive of such technologies, and in the final part of this work we demonstrate the possibility of using ECoG to decode the kinematic characteristics of the finger movement.
The performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of deep learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field and include topics such as feature learning, metric learning, compositional modeling, structured prediction, reinforcement learning, and issues regarding large-scale learning and non-convex optimization. The range of domains to which these techniques apply is also very broad, from vision to speech recognition, text understanding, gaming, music, etc.
Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year. The conference focus on areas of intelligent systems and artificial intelligence (AI) and how it applies to the real world. IntelliSys is one of the best respected Artificial Intelligence (AI) Conference.
Autonomous taxies are in high demand for smart city scenario. Such taxies have a well specified path to travel. Therefore, these vehicles only required two important parameters. One is detection parameter and other is control parameter. Further, detection parameters require turn detection and obstacle detection. The control parameters contain steering control and speed control. In this paper a novel autonomous taxi model has been proposed for smart city scenario. Deep learning has been used to model the human driver capabilities for the autonomous taxi. A hierarchical Deep Neural Network (DNN) architecture has been utilized to train various driving aspects. In first level, the proposed DNN architecture classifies the straight and turning of road. A parallel DNN is used to detect obstacle at level one. In second level, the DNN discriminates the turning i.e. left or right for steering and speed controls. Two multi layered DNNs have been used on Nvidia Tesla K 40 GPU based system with Core i-7 processor. The mean squared error (MSE) for the detection parameters viz. speed and steering angle were 0.018 and 0.0248 percent, respectively, with 15 milli seconds of realtime response delay.
A new public dataset of traffic sign images is presented. The dataset is intended for training and testing the algorithms of traffic sign recognition. We describe the dataset structure and guidelines for working with the dataset, comparing it with the previously published traffic sign datasets. The evaluation of modern detection and classification algorithms conducted using the proposed dataset has shown that existing methods of recognition of a wide class of traffic signs do not achieve the accuracy and completeness required for a number of applications.
It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard retrieval benchmarks, we establish that neural codes perform competitively even when the convolutional neural network has been trained for an unrelated classification task (e.g. Image-Net). We also evaluate the improvement in the retrieval performance of neural codes, when the network is retrained on a dataset of images that are similar to images encountered at test time. We further evaluate the performance of the compressed neural codes and show that a simple PCA compression provides very good short codes that give state-of-the-art accuracy on a number of datasets. In general, neural codes turn out to be much more resilient to such compression in comparison other state-of-the-art descriptors. Finally, we show that discriminative dimensionality reduction trained on a dataset of pairs of matched photographs improves the performance of PCA-compressed neural codes even further. Overall, our quantitative experiments demonstrate the promise of neural codes as visual descriptors for image retrieval.
We analyzed the way to increase computational efficiency of video-based image recognition methods with matching of high dimensional feature vectors extracted by deep convolutional neural networks. We proposed an algorithm for approximate nearest neighbor search. At the first step, for a given video frame the algorithm verifies a reference image obtained when recognizing the previous frame. After that the frame is compared with a few number of reference images. Each next examined reference image is chosen so that to maximize conditional probability density of distances to the reference instances tested at previous steps. To decrease the required memory space we beforehand calculate only distances from all the images to small number of instances (pivots). When experimenting with either face photos from Labeled Faces in the Wild and PubFig83 datasets or with video data from YouTube Faces we showed that our algorithm allows accelerating the recognition procedure by 1.4–4 times comparing with known approximate nearest neighbor methods.