Data organization in video surveillance systems using deep learning
In this paper we propose to organize information in video surveillance systems by grouping the video tracks, which contain identical faces. Aggregation of the features of individual frames extracted using deep convolutional neural networks are used in order to obtain a descriptor of video track. The tracks with identical faces are grouped using the known face verification algorithms and clustering methods. We experimentally compare frame aggregation methods using the YouTubeFaces dataset and contemporary neural networks (VGGFace, VGGFace2, LightenedCNN). It is shown that the most accurate video-based face verification is achieved with the L2-normalization of average unnormalized features of individual frames of each video track. Finally, we demonstrate that the best video grouping is obtained by sequential and rank-order clustering methods.