Распознавание пола и возраста по видеоизображению лица на основе сверточных нейронных сетей
The paper reviews the problem of age and gender recognition methods for video data using modern deep convolutional neural networks. We present the comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames. We implemented the video-based recognition system with several aggregation methods to improve the age and gender identification accuracy. The experimental comparison of the proposed approach with traditional simple voting using IJB-A, Indian Movies, and Kinect datasets is provided. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively.
The paper considers the use of convolutional neural networks for the concurrent recognition of the gender and age of a person by video records of his face. The emphasis is on the incorporation of the approach into mobile video-recording software. We have investigated the fusion of decisions obtained during the processing of each video frame, including the use of the classifier committee based on Dempster–Shafer theory. We propose the novel age prediction method using the evaluation of the expectation of the most probable ages. We have compared existing neural-net models with a specially trained modification of the MobileNet convolution network with two outputs. The experimental results are given for such data collections as Kinect, IJB-A, Indian Movie and EmotiW. As compared with other conventional methods, our approach makes it possible to increase the age and sex recognition accuracy by 2-5% and 5-10% respectively.
An ensemble of classifiers has been built to solve the problem of video image recognition. The paper offers a way to estimate the a posteriori probability of an image belonging to a particular class in the case of an arbitrary distance and nearest neighbor method. The estimation is shown to be equivalent to the optimal naive Bayesian estimate given Kullback-Leibler divergence being used as proximity measure. The block diagram of a video image recognition system is presented. The system features automatic adaptation of the list of images of identical objects which is fed to the committee machine input. The system is tested in face recognition task using popular data bases (FERET, AT&T, Yale) and the results are discussed.
In this paper we focus on the problem of multi-label image recognition for visually-aware recommender systems. We propose a two stage approach in which a deep convolutional neural network is firstly fine-tuned on a part of the training set. Secondly, an attention-based aggregation network is trained to compute the weighted average of visual features in an input image set. Our approach is implemented as a mobile fashion recommender system application. It is experimentally show on the Amazon Fashion dataset that our approach achieves an F1-measure of 0.58 for 15 recommendations, which is twice as good as the 0.25 F1-measure for conventional averaging of feature vectors.