Group-Level Emotion Recognition Using Transfer Learning From Face Identification
In this paper we study the image recognition tasks, in which images are described by high dimensional feature vectors extracted with deep convolutional neural networks and principal component analysis. In particular, we focus on the problem of high computational complexity of statistical approach with non-parametric estimates of probability density implemented by the probabilistic neural network. We propose the novel statistical classification method based on the density estimators with the orthogonal expansions using trigonometric series. It is shown that this approach makes it possible to overcome the drawbacks of the probabilistic neural network caused by the memory-based approach of instance-based learning. Our experimental study with Caltech-101 and CASIA WebFaces demonstrates that the proposed approach reduces error rate at 1-5%, and increases computational speed in 1.5-6 times when compared to the original probabilistic neural network for small samples of reference images.
In this paper we focus on the problem of user prediction in visual product recommender systems based on the given set of photos of products purchased by the user previously. We studied neural aggregation methods for image features extracted by the deep neural networks. We propose the novel two-stage algorithm. At first, the image features are learned by fine-tuning the convolutional neural network. At the second stage, we sequentially combine the known learnable pooling techniques (neural aggregation network and context gating) in order to compute a single descriptor for particular user as a weighted average of image features. It is experimentally shown for the Amazon product dataset that F1-measure for our approach is more than 20% higher when compared to conventional averaging of the feature vector.
The paper addresses the issue of insufficient speed of image recognition methods if the number of classes is rather large. We propose the novel algorithm based on sequential three-way decisions and a formal description of granular computing. Each image is associated with principal component scores of the high-dimensional features extracted by deep convolution neural network. Low number of principal components stand for the coarse-grained granules, while fine-grained granules include all components. Initially, first principal components of an observed image and all training instances are matched at the coarsest granularity level. Next, negative decisions are defined by using the multiple comparisons theory and asymptotic distribution of the Kullback-Leibler divergence. Namely, the distance factors (ratios of the minimum distance and all other distances) are evaluated. The set of negative decisions is populated by the instances, for which the distance factors exceed a certain threshold. The images from this set are not examined at the next levels with finer granularity. In the experiments unconstrained face recognition and image categorisation are considered using the state-of-the-art deep learning-based feature extractors. We demonstrate that the proposed approach decreases the running time in 1.5–10 times when compared to conventional classifiers and the known multi-class decision-theoretic rough sets.
In this paper we describe our algorithmic approach, which was used for submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017) group-level emotion recognition sub-challenge. We extracted feature vectors of detected faces using the Convolutional Neural Network trained for face identification task, rather than traditional pre-training on emotion recognition problems. In the final pipeline an ensemble of Random Forest classifiers was learned to predict emotion score using available training set. In case when the faces have not been detected, one member of our ensemble extracts features from the whole image. During our experimental study, the proposed approach showed the lowest error rate when compared to other explored techniques. In particular, we achieved 75.4% accuracy on the validation data, which is 20% higher than the handcrafted feature-based baseline. The source code using Keras framework is to be made publicly available.
The paper deals with unconstrained face recognition task for the small sample size problem based on computation of distances between high-dimensional off-the-shelf features extracted by deep convolution neural network. We present the novel statistical recognition method, which maximizes the likelihood (joint probabilistic density) of the distances to all reference images from the gallery set. This likelihood is estimated with the known asymptotically normal distribution of the Kullback–Leibler discrimination between nonnegative features. Our approach penalizes the individuals if their feature vectors do not behave like the features of observed image in the space of dissimilarities of the gallery images. We provide the experimental study with the LFW (Labeled Faces in the Wild), YTF (YouTube Faces) and IJB-A (IARPA Janus Benchmark A) datasets and the state-of-the-art deep learning-based feature extractors (VGG-Face, VGGFace2, ResFace-101, CenterFace and Light CNN). It is demonstrated, that the proposed approach can be applied with traditional distances in order to increase accuracy in 0.3–5.5% when compared to known methods, especially if the training and testing images are significantly different.
The task of organizing information in video surveillance systems is implemented by grouping the video tracks, which contain identical faces. We examine aggregation methods for the features of individual frames extracted using deep convolutional neural networks. The tracks with identical faces are grouped based on known face verification algorithms and clustering methods. Experimental study on the YouTubeFaces dataset demonstrates results of combining frame features in order to obtain a descriptor of video track. It is shown that the most accurate method is L2-normalization of average unnormalized features of individual frames of each video track.