The paper reviews the problem of age and gender recognition methods for video data using modern deep convolutional neural networks. We present the comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames. We implemented the video-based recognition system with several aggregation methods to improve the age and gender identification accuracy. The experimental comparison of the proposed approach with traditional simple voting using IJB-A, Indian Movies, and Kinect datasets is provided. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively.
In this paper, we discuss a semi-dense depth map interpolation method based on convolutional neural network. We propose a compact neural network architecture with loss function defined as Euclidean distance in the feature space of VGG-16 neural network used for deep visual recognition. The suggested solution shows state-of-art performance on synthetic and real datasets. Together with LSD-SLAM, the method could be used to provide a dense depth map for interaction purposes, such as creating a first person game in AR/MR or perception module for autonomous vehicle.
The paper considers the use of convolutional neural networks for the concurrent recognition of the gender and age of a person by video records of his face. The emphasis is on the incorporation of the approach into mobile video-recording software. We have investigated the fusion of decisions obtained during the processing of each video frame, including the use of the classifier committee based on Dempster–Shafer theory. We propose the novel age prediction method using the evaluation of the expectation of the most probable ages. We have compared existing neural-net models with a specially trained modification of the MobileNet convolution network with two outputs. The experimental results are given for such data collections as Kinect, IJB-A, Indian Movie and EmotiW. As compared with other conventional methods, our approach makes it possible to increase the age and sex recognition accuracy by 2-5% and 5-10% respectively.
An ensemble of classifiers has been built to solve the problem of video image recognition. The paper offers a way to estimate the a posteriori probability of an image belonging to a particular class in the case of an arbitrary distance and nearest neighbor method. The estimation is shown to be equivalent to the optimal naive Bayesian estimate given Kullback-Leibler divergence being used as proximity measure. The block diagram of a video image recognition system is presented. The system features automatic adaptation of the list of images of identical objects which is fed to the committee machine input. The system is tested in face recognition task using popular data bases (FERET, AT&T, Yale) and the results are discussed.