?
MobileEmotiFace: Efficient Facial Image Representations in Video-Based Emotion Recognition on Mobile Devices
In this paper, we address the emotion classification problem in videos using a two-stage approach. At the first stage, deep features are extracted from facial regions detected in each video frame using a MobileNet-based image model. This network has been preliminarily trained to identify the age, gender, and identity of a person, and further fine-tuned on the AffectNet dataset to classify emotions in static images. At the second stage, the features of each frame are aggregated using multiple statistical functions (mean, standard deviation, min, max) into a single MobileEmotiFace descriptor of the whole video. The proposed approach is experimentally studied on the AFEW dataset from the EmotiW 2019 challenge. It was shown that our image mining technique leads to more accurate and much faster decision-making in video-based emotion recognition when compared to conventional feature extractors.